All articles
arxiv arXiv cs.LG · 5h ago

AsyncOPD: How Stale Can On-Policy Distillation Be?

This article presents AsyncOPD, a fully asynchronous on-policy distillation pipeline that decouples rollout generation from learner updates to alleviate training bottlenecks in large language model post-training. The authors provide the first systematic study of staleness effects in this context, demonstrating that teacher-weighted forward KL is robust to stale rollouts while student-weighted reverse KL is vulnerable.

arxiv arXiv cs.LG · 6h ago

Lightweight Transformer Models for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment

This study benchmarks traditional machine learning methods against lightweight transformer architectures for binary fault detection across three public datasets, evaluating tradeoffs between accuracy, model size, and latency. The research assesses classification performance using F1-score and AUC, while also testing INT8 dynamic quantization and a two-stage adaptive inference pipeline to optimize deployment on resource-constrained hardware.