Inference efficiency
arxiv arXiv cs.LG · 8d ago

MGUP: Momentum-Gradient Alignment for Selective Optimization

MGUP introduces a selective update mechanism that applies larger step-sizes to a fixed proportion of parameters in stochastic optimization, while using smaller, non-zero step-sizes for the rest. It integrates seamlessly with optimizers like AdamW, Lion, and Muon, providing theoretical convergence guarantees for MGUP-AdamW and demonstrating superior or more stable performance in training large language models and MAE pretraining tasks.

arxiv arXiv cs.AI · 8d ago

Embedded ML Workflow for Microcontroller Edge Devices

This paper outlines a systems-oriented workflow for embedded machine learning on microcontroller-class devices. It details key engineering decisions such as data sampling, feature extraction, class imbalance validation, model-runtime co-design, and streaming deployment, using inertial motion recognition and keyword spotting as case studies. The work provides practical design rules for robust on-device inference, including data curation, quantization, thresholding, scheduling, and field monitoring.

media r/LocalLLaMA · 8d ago

Cheapest hardware for Qwen 3.6: 27B and 35B-A3B models

A Reddit post discusses the cost-effective hardware setup for running Qwen 3.6 models, both 27B and 35B-A3B, noting that RTX 3090 24GB offers better long-term value over Tesla V100 due to discontinuation and upcoming Chinese alternatives. The proposed build totals $1,995.65, including a Ryzen 5 5600X, RTX 3090 24GB, and essential components, with the total price being a key concern for users seeking affordability.