Training methods
arxiv arXiv cs.LG · 8d ago

MGUP: Momentum-Gradient Alignment for Selective Optimization

MGUP introduces a selective update mechanism that applies larger step-sizes to a fixed proportion of parameters in stochastic optimization, while using smaller, non-zero step-sizes for the rest. It integrates seamlessly with optimizers like AdamW, Lion, and Muon, providing theoretical convergence guarantees for MGUP-AdamW and demonstrating superior or more stable performance in training large language models and MAE pretraining tasks.

arxiv arXiv cs.LG · 8d ago

Reversal Q-Learning: A New Off-Policy RL Algorithm

Reversal Q-Learning (RQL) is a new off-policy reinforcement learning algorithm that trains a flow policy using prior data. By modeling flow refinement steps as actions in an expanded Markov decision process and applying virtual on-policy trajectories via reversal, RQL enables effective offline learning without backpropagation through time. Experiments on 50 robotic tasks show RQL achieves the best average performance among state-of-the-art flow-based offline RL methods.

arxiv arXiv cs.LG · 8d ago

Credit-in-Event: Re-Anchoring Event Credit in Dynamics Models

A new method called Credit-in-Event identifies and addresses temporal credit dilution in learned dynamics models. CREST, a label-free and training-free readout, re-anchors pooled representations by estimating transient event cores and applying event-versus-rest contrast, reducing out-of-distribution error across multiple systems and data types. Ablations confirm the improvement stems from event-core credit re-anchoring, not generic locality or stability priors.

arxiv arXiv cs.LG · 8d ago

Confusion-Aware Transfer Teacher Curriculum Learning Framework

A confusion-aware difficulty score is introduced within the Transfer Teacher framework to improve model interpretability and data efficiency. Evaluations on CIFAR-10 show that confusion-aware curriculum ordering outperforms random ordering by up to 8.7% at 20% data, demonstrating consistent data-efficiency gains. However, curriculum or anti-curriculum ordering does not improve accuracy over standard training at full data, indicating that scoring function improvements alone are insufficient to overcome curriculum learning failure modes.

arxiv arXiv cs.LG · 8d ago

MKAN: Monotonic Kolmogorov-Arnold Networks with Hard Monotonicity

MKAN introduces a Kolmogorov-Arnold Network with hard monotonicity guaranteed for all parameter values, achieved through exponential reparameterization, positive edge weights, and a monotone base activation. It enables standard gradient descent training and provides a representation-cost theorem showing that any feature extractor can be realized with monotone structure at a size no more than twice the original, offering a principled scaling rule for monotone encoders.

arxiv arXiv cs.LG · 8d ago

KANLib: A Modular and Efficient Kolmogorov-Arnold Network Framework

KANLib introduces a modular, extensible, and computationally efficient framework for Kolmogorov-Arnold Networks. It unifies core concepts from PyKAN, EfficientKAN, and FastKAN, supporting adaptive grid rescaling and fine-grained architectural customization while maintaining PyTorch compatibility. Experiments on the California Housing dataset show KANLib achieves competitive efficiency and reproduces established KAN performance.