Training methods — korshunov.ai — ML news

Training methods Page 1 / 14

arxiv arXiv cs.CL · 8d ago

ZPPO: Teacher in Prompts, Not Gradients

Zone of Proximal Policy Optimization (ZPPO) integrates teacher knowledge directly into prompts rather than policy gradients. It uses Binary and Negative Candidate-included Questions to surface student failure modes and amplifies learning through a prompt replay buffer, achieving superior performance on hard questions across student scales, especially at smaller model sizes.

arxiv arXiv cs.CL · 8d ago

Variable-Width Transformers Outperform Uniform Architectures

A new \times-shaped transformer architecture allocates varying layer widths, widening early and late layers while narrowing middle ones. It reduces average layer width, leading to 22% fewer FLOPs and 15% less KV cache memory, while outperforming uniform baselines on language modeling loss across 200M to 2B parameter models.

arxiv arXiv cs.LG · 8d ago

MGUP: Momentum-Gradient Alignment for Selective Optimization

MGUP introduces a selective update mechanism that applies larger step-sizes to a fixed proportion of parameters in stochastic optimization, while using smaller, non-zero step-sizes for the rest. It integrates seamlessly with optimizers like AdamW, Lion, and Muon, providing theoretical convergence guarantees for MGUP-AdamW and demonstrating superior or more stable performance in training large language models and MAE pretraining tasks.

arxiv arXiv cs.LG · 8d ago

ReLAR: Reinforcement-Guided Latent Refinement for Stable LLM Reasoning

ReLAR introduces a reinforcement-guided framework that iteratively refines hidden states to improve LLM reasoning stability. It uses learned depth and action controllers trained via policy gradients to adaptively determine refinement steps, achieving better accuracy and generation quality with lower inference overhead than explicit reasoning methods.

arxiv arXiv cs.LG · 8d ago

NMF with Topological Regularisation for Interpretable Bases

A new method integrates persistent homology into non-negative matrix factorisation to regularise the topology of basis functions. This approach enables spatially coherent image components, periodic time-series, and clique-like graph signals by using threshold-free topological scores as regularisers in the NMF objective.

arxiv arXiv cs.LG · 8d ago

CARLOS: Deep RL for Continuous-time Optimal Stopping

CARLOS uses an aggregate deep neural network to learn a joint space-time exercise boundary for optimal stopping problems. It progressively refines stopping decisions at finer time resolutions and employs adaptive sampling to focus training near the stopping boundary. Benchmarked results show CARLOS outperforms existing Bermudan solvers, approaching the American upper bound with high efficiency.

arxiv arXiv cs.LG · 8d ago

Reversal Q-Learning: A New Off-Policy RL Algorithm

Reversal Q-Learning (RQL) is a new off-policy reinforcement learning algorithm that trains a flow policy using prior data. By modeling flow refinement steps as actions in an expanded Markov decision process and applying virtual on-policy trajectories via reversal, RQL enables effective offline learning without backpropagation through time. Experiments on 50 robotic tasks show RQL achieves the best average performance among state-of-the-art flow-based offline RL methods.

arxiv arXiv cs.LG · 8d ago

SCBoost: Reducing Learner Redundancy via Residual Orthogonalization

SCBoost introduces residual orthogonalization to eliminate learner redundancy in boosting. It uses Spectral Residual Projection and Covariance-Regularized Weighting to ensure each learner captures novel error components and reduces ensemble correlations. Theoretical analysis and experiments show improved accuracy and F1 scores on ten benchmark datasets.

arxiv arXiv cs.LG · 8d ago

Credit-in-Event: Re-Anchoring Event Credit in Dynamics Models

A new method called Credit-in-Event identifies and addresses temporal credit dilution in learned dynamics models. CREST, a label-free and training-free readout, re-anchors pooled representations by estimating transient event cores and applying event-versus-rest contrast, reducing out-of-distribution error across multiple systems and data types. Ablations confirm the improvement stems from event-core credit re-anchoring, not generic locality or stability priors.

arxiv arXiv cs.LG · 8d ago

SelFix: Root-Selecting Fixed-Point Inversion for Rectified Flows via Trajectory Straightness

SelFix improves fixed-point inversion by selecting solutions that produce straighter inverse trajectories, enhancing real-image reconstruction and source-preserving editing. Experiments on FLUX.1-dev and PIE-Bench show it outperforms prior baselines in both reconstruction quality and editing fidelity.

arxiv arXiv cs.LG · 8d ago

Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction

A new framework decomposes pre-hoc fine-tuning prediction risk into intrinsic limits and optimization variance. It proves a necessary lower bound on variance decay and introduces a budget-optimal probing strategy, validated across synthetic and real-world benchmarks through three distinct prediction regimes.

arxiv arXiv cs.LG · 8d ago

Learnable Graph Patches for Feature Heterogeneity

We propose learnable graph patches as the smallest semantic units in graph data to address feature heterogeneity without textual information. Our framework uses patch encoders and aggregators to extract and combine knowledge across domains, enabling universal pre-training and improved downstream performance with more pre-training data.

arxiv arXiv cs.LG · 8d ago

EnvRL: Leveraging Environment Dynamics in Agentic RL

EnvRL introduces a framework that enhances agentic reinforcement learning by incorporating environment dynamics through state prediction and inverse dynamics objectives. When trained with GRPO, EnvRL improves success rates of Qwen-2.5-1.5B-Instruct from 72.8% to 77.4% on ALFWorld and from 56.8% to 67.0% on WebShop.

arxiv arXiv cs.LG · 8d ago

Confusion-Aware Transfer Teacher Curriculum Learning Framework

A confusion-aware difficulty score is introduced within the Transfer Teacher framework to improve model interpretability and data efficiency. Evaluations on CIFAR-10 show that confusion-aware curriculum ordering outperforms random ordering by up to 8.7% at 20% data, demonstrating consistent data-efficiency gains. However, curriculum or anti-curriculum ordering does not improve accuracy over standard training at full data, indicating that scoring function improvements alone are insufficient to overcome curriculum learning failure modes.

arxiv arXiv cs.LG · 8d ago

Lightweight Experiential Latent Memories for Continual Self-Improvement

A new method enables large language models to learn from their own reasoning traces without external supervision. By distilling inference-time computation into lightweight, modular latent memories, the model achieves performance competitive with full training and outperforms zero-shot and raw ICL baselines on mathematical reasoning tasks, with minimal computational overhead.

arxiv arXiv cs.LG · 8d ago

Conservation Laws for Modern Neural Architectures

This paper introduces a unified framework to identify conservation laws in gradient flow for modern neural architectures. It covers feedforward networks with GELU, SiLU, and SwiGLU activations, multihead attention with sinusoidal and rotary positional encodings, and Mixture-of-Experts models under various gating schemes. Experiments validate the predicted invariants, supporting the theoretical findings.

arxiv arXiv cs.LG · 8d ago

AnchorKV: Safety-Aware KV Cache Compression with Refusal Anchor

AnchorKV introduces a soft penalty mechanism to bias KV cache token retention away from harmful prompt directions. It uses a layer-specific key projection space anchor derived from representation engineering to improve safety alignment without sacrificing much utility, offering a drop-in solution that enhances defense against jailbreak attacks.

arxiv arXiv cs.LG · 8d ago

MKAN: Monotonic Kolmogorov-Arnold Networks with Hard Monotonicity

MKAN introduces a Kolmogorov-Arnold Network with hard monotonicity guaranteed for all parameter values, achieved through exponential reparameterization, positive edge weights, and a monotone base activation. It enables standard gradient descent training and provides a representation-cost theorem showing that any feature extractor can be realized with monotone structure at a size no more than twice the original, offering a principled scaling rule for monotone encoders.

arxiv arXiv cs.LG · 8d ago

Dimensionality Controls When Modularity Helps in Continual Learning

Modular architecture enhances compositional continual learning only in low-dimensional regimes where representational subspaces partially align for similar tasks. In high-dimensional regimes, both modular and single networks perform similarly, indicating modularity's benefit depends on representational dimensionality induced by initialization scale.

arxiv arXiv cs.LG · 8d ago

KANLib: A Modular and Efficient Kolmogorov-Arnold Network Framework

KANLib introduces a modular, extensible, and computationally efficient framework for Kolmogorov-Arnold Networks. It unifies core concepts from PyKAN, EfficientKAN, and FastKAN, supporting adaptive grid rescaling and fine-grained architectural customization while maintaining PyTorch compatibility. Experiments on the California Housing dataset show KANLib achieves competitive efficiency and reproduces established KAN performance.