Training methods — korshunov.ai

Training methods Page 1 / 13

EnvRL: Leveraging Environment Dynamics in Agentic RL

EnvRL introduces a framework that enhances agentic reinforcement learning by incorporating environment dynamics through state prediction and inverse dynamics objectives. When trained with GRPO, EnvRL improves success rates of Qwen-2.5-1.5B-Instruct from 72.8% to 77.4% on ALFWorld and from 56.8% to 67.0% on WebShop.

arxiv arXiv cs.LG · 8d ago

Confusion-Aware Transfer Teacher Curriculum Learning Framework

A confusion-aware difficulty score is introduced within the Transfer Teacher framework to improve model interpretability and data efficiency. Evaluations on CIFAR-10 show that confusion-aware curriculum ordering outperforms random ordering by up to 8.7% at 20% data, demonstrating consistent data-efficiency gains. However, curriculum or anti-curriculum ordering does not improve accuracy over standard training at full data, indicating that scoring function improvements alone are insufficient to overcome curriculum learning failure modes.

arxiv arXiv cs.LG · 8d ago

Lightweight Experiential Latent Memories for Continual Self-Improvement

A new method enables large language models to learn from their own reasoning traces without external supervision. By distilling inference-time computation into lightweight, modular latent memories, the model achieves performance competitive with full training and outperforms zero-shot and raw ICL baselines on mathematical reasoning tasks, with minimal computational overhead.

arxiv arXiv cs.LG · 8d ago

Conservation Laws for Modern Neural Architectures

This paper introduces a unified framework to identify conservation laws in gradient flow for modern neural architectures. It covers feedforward networks with GELU, SiLU, and SwiGLU activations, multihead attention with sinusoidal and rotary positional encodings, and Mixture-of-Experts models under various gating schemes. Experiments validate the predicted invariants, supporting the theoretical findings.

arxiv arXiv cs.LG · 8d ago

AnchorKV: Safety-Aware KV Cache Compression with Refusal Anchor

AnchorKV introduces a soft penalty mechanism to bias KV cache token retention away from harmful prompt directions. It uses a layer-specific key projection space anchor derived from representation engineering to improve safety alignment without sacrificing much utility, offering a drop-in solution that enhances defense against jailbreak attacks.

arxiv arXiv cs.LG · 8d ago

MKAN: Monotonic Kolmogorov-Arnold Networks with Hard Monotonicity

MKAN introduces a Kolmogorov-Arnold Network with hard monotonicity guaranteed for all parameter values, achieved through exponential reparameterization, positive edge weights, and a monotone base activation. It enables standard gradient descent training and provides a representation-cost theorem showing that any feature extractor can be realized with monotone structure at a size no more than twice the original, offering a principled scaling rule for monotone encoders.

arxiv arXiv cs.LG · 8d ago

Dimensionality Controls When Modularity Helps in Continual Learning

Modular architecture enhances compositional continual learning only in low-dimensional regimes where representational subspaces partially align for similar tasks. In high-dimensional regimes, both modular and single networks perform similarly, indicating modularity's benefit depends on representational dimensionality induced by initialization scale.

arxiv arXiv cs.LG · 8d ago

KANLib: A Modular and Efficient Kolmogorov-Arnold Network Framework

KANLib introduces a modular, extensible, and computationally efficient framework for Kolmogorov-Arnold Networks. It unifies core concepts from PyKAN, EfficientKAN, and FastKAN, supporting adaptive grid rescaling and fine-grained architectural customization while maintaining PyTorch compatibility. Experiments on the California Housing dataset show KANLib achieves competitive efficiency and reproduces established KAN performance.

arxiv arXiv cs.LG · 8d ago

SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMs

SoftMoE replaces discrete top-k routing with a differentiable soft top-k LapSum relaxation, enabling gradient-based optimization of expert selection. It learns to allocate expert activation non-uniformly across layers, with later layers activating more experts, while using significantly fewer experts than traditional sparse MoE.

arxiv arXiv cs.LG · 8d ago

Differential Privacy in Gaussian Process Posterior Sampling

Gaussian process posterior sampling inherently provides differential privacy due to its intrinsic randomness. Explicit Rényi-DP bounds show that privacy depends on ridge regularisation, with membership-inference attacks confirming the predicted leakage patterns. Adding calibrated GP noise enhances privacy while maintaining utility in downstream tasks.

arxiv arXiv cs.LG · 8d ago

C2FL: Clustered Continual Federated Learning under Spatial and Temporal Drift

C2FL is a distributed federated learning approach that enables nodes to self-organize into spatial clusters based on geographic proximity. It addresses temporal drift by combining experience replay with dwell-time-aware adaptive averaging, allowing nodes to maintain updated, region-specific knowledge while adapting to evolving environmental conditions.

arxiv arXiv cs.LG · 8d ago

BLITZ: Fast and Calibrated Nonparametric Conditional Independence Test

BLITZ introduces a two-stage regression method for nonparametric conditional independence testing. It first removes broad smooth dependencies using polynomial regression, then applies shallow tree regressions to residualize nonlinear features, enabling accurate and fast testing with improved null calibration compared to existing methods.

arxiv arXiv cs.AI · 8d ago

STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-Training

STAR introduces a spatio-temporal reward allocation method for text-to-image generation, using attention maps to dynamically assign advantages across denoising steps. It improves semantic alignment, text rendering, and preference optimization in Stable Diffusion 3.5 Medium, achieving 0.9759, 0.9757, and 23.60 on GenEval, OCR, and PickScore respectively.

arxiv arXiv cs.AI · 8d ago

C2FL: Clustered Continual Federated Learning under Spatial and Temporal Drift

arxiv arXiv cs.AI · 8d ago

Catastrophic Forgetting is Low-Rank: A Function-Space Theory

A function-space theory reveals that catastrophic forgetting in continual adaptation concentrates in a small number of old-task NTK eigenmodes. In frozen-backbone linear-head PEFT-CL, the forgetting vector is exactly predictable up to numerical precision, with a Kronecker scaling rule for the vulnerable rank.

arxiv arXiv cs.AI · 8d ago

Volterra Generative Models Introduce Fractional Noise for Score-Based Generation

Volterra generative models propose a continuous-time score-based framework using fractional kernels to inject path-dependent noise, avoiding memoryless noising in traditional diffusion models. The approach employs finite-dimensional Markovian lifts and demonstrates improved generation on MNIST and CIFAR-10, with a bridge sampler enhancing stability for larger models.

arxiv arXiv cs.AI · 8d ago

S4oP: Operator-level Pruning for Efficient SSM Deployment

S4oP introduces an incremental, operator-level pruning method for S4 and S4D models, reducing inference cost by up to 70% while maintaining performance. The approach combines structured masking with fine-tuning and jointly tracks accuracy and latency, enabling efficient deployment of SSMs on resource-constrained devices.

arxiv arXiv cs.AI · 8d ago

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

The paper introduces a framework for multi-policy multi-objective reinforcement learning that learns a set of Pareto-optimal policies ensuring fairness across diverse user preferences. It proves fair policies remain within the convex coverage set for concave welfare functions like GGF and proposes three algorithms that incorporate non-stationary and stochastic policies to adapt to historical inequities. Empirical results show these methods effectively learn fair policies across multiple domains.

arxiv arXiv cs.AI · 8d ago

Ternary Mamba: Pretrained QAT for Efficient SSM Compression

Ternary Mamba achieves 3.61x compression of Mamba-2 using grouped quantization-aware training from a pretrained checkpoint, reducing memory from 2,687 to 744 MB. It reaches 48.1% zero-shot accuracy with only 102M tokens and 4 GPU-hours, matching Bi-Mamba within 0.9 percentage points, while revealing new instability from learnable quantization scales and error accumulation in recurrence.

arxiv arXiv cs.AI · 8d ago

Meta-Knowledge Reutilization in Reinforcement Learning

A new framework learns task-level knowledge on a simplified agent and transfers it to heterogeneous agents. It uses Bayesian non-parametric priors and a high-level policy to generate task guidance, with a semantic-magnitude interface and temporal adaptor to align meta-knowledge with embodiment-specific controllers. Experiments show 94.75% to 99.79% reduction in final-step tracking error and comparable performance using 23.8% of the interaction data of state-of-the-art methods.