Training methods — korshunov.ai

Training methods Page 1 / 12

Boundary Embedding Shaping for Graph Structural Disentanglement

Boundary Embedding Shaping (BES) addresses graph structural entanglement by selectively suppressing spurious neighbor correlations near class boundaries. BES uses adaptive contrastive learning to enhance boundary discrimination, improving GCN node classification by an average of 3.3% (up to 5.0% on WikiCS) and achieving superior link prediction accuracy.

arxiv arXiv cs.LG · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to develop 'Connect the Dots' capability, allowing long-lifecycle agents to learn from experiences and iteratively update their environment context. The framework uses reinforcement learning with long rollout sequences and custom tasks to promote cross-domain generalization, showing effective out-of-distribution performance in both domains and transition settings.

arxiv arXiv cs.LG · 6d ago

StreamKL: Fast and Memory-Efficient KL Divergence for Attention Distillation

StreamKL introduces a fused GPU primitive that eliminates quadratic memory usage in attention distillation by streaming query-key tiles through on-chip SRAM. It achieves up to 43x speedup in forward and 14x in backward passes, reducing extra HBM footprint from O(N_QN_K) to O(1), enabling long-context distillation on a single GPU.

arxiv arXiv cs.LG · 6d ago

VIMPO: Critic-Free Policy Optimization for LLMs

VIMPO introduces a critic-free policy optimization method that derives a policy-implied value function from KL-regularized reinforcement learning. It enables verifiable reward incorporation without training a critic and outperforms GRPO on mathematical benchmarks, especially under noisy rewards.

arxiv arXiv cs.LG · 6d ago

LLM-based Hierarchical Control in Multi-Agent Games

A hierarchical system using a pretrained LLM to select RL skill policies outperforms flat RL in a 2v2 King of the Hill environment. It matches hand-crafted behavior tree performance in win rate and is perceived as more human-like by 60% of users, highlighting effective coordination and adaptability without manual rule design.

arxiv arXiv cs.LG · 6d ago

Information-Theoretic Analysis of Effective Supervision in Latent Chain-of-Thought

This paper identifies a dual collapse in latent reasoning: gradient attenuation and representational drift. It proposes Trajectory and Space Supervision, showing that generative reconstruction preserves information capacity better than geometric compression. The Unified Latent Probe measures mutual information between latent trajectories and reasoning steps, revealing an information-performance binding in reasoning accuracy.

arxiv arXiv cs.LG · 6d ago

Sensorimotor World Models for Action-Aligned Perception

A sensorimotor world model (SMWM) is introduced that learns compact, action-aligned latent representations from offline trajectories. It uses inverse dynamics regularization to prevent representation collapse and enable stable, interpretable world models without requiring frozen encoders or complex regularizers. SMWM achieves competitive planning performance in 2D and 3D control tasks.

arxiv arXiv cs.LG · 6d ago

Quantile of Means: Ensemble Method for Minimax Optimal RL

A new ensemble method for finite-horizon MDPs uses quantile-based estimates to achieve minimax optimal regret bounds. It eliminates reliance on count-based uncertainty and provides theoretical justification for ensemble-based exploration in reinforcement learning.

arxiv arXiv cs.LG · 6d ago

Off-Policy Evaluation for MNAR Rewards in MDPs

We propose an off-policy evaluation method for finite-horizon MDPs with rewards missing not at random. Our approach uses a reward-dependent propensity model and a bridge function to recover conditional mean rewards without modeling the MNAR mechanism, achieving consistency and finite-sample error bounds. Experiments on simulated and MIMIC-III Sepsis data show superior performance over existing methods.

arxiv arXiv cs.LG · 6d ago

Boundary Embedding Shaping for Graph Structural Disentanglement

arxiv arXiv cs.LG · 6d ago

SLiR: Shifting-based Linear Relaxations for Activation Functions

SLiR enables sound, tight linear relaxations of general activation functions using only Lipschitz constants or critical points. It achieves up to 7.8x more verification properties than state-of-the-art methods by efficiently computing upper and lower bounds via a shifting procedure.

arxiv arXiv cs.LG · 6d ago

Statistical Properties of Training and Generalization

The article examines deep learning's deviation from classical statistical intuitions, emphasizing neural scaling laws and their interaction with physical constraints and inductive biases in machine learning applications.

arxiv arXiv cs.LG · 6d ago

Model-Driven Approach for RL Environment Families

A model-driven approach generates families of reinforcement learning environments using a hybrid genetic algorithm. Environment variants are created through model transformations guided by a state-of-the-art model transformation engine, enabling scalable and error-resistant development. The method is validated in wildfire mitigation and curriculum learning scenarios.

arxiv arXiv cs.LG · 6d ago

Recurrent neural networks approximate continuous functions

A single ReLU recurrent neural network with fixed weights and hidden dimension can uniformly approximate any continuous function on [-1,1] as its runtime increases. This is achieved via a new model, the Turing machine with neural units (TMNU), which balances algorithmic flexibility with bounded simulation by RNNs. The convergence rates match polynomial approximation rates, and minimax lower bounds confirm that runtime is an essential, unavoidable resource.

arxiv arXiv cs.LG · 6d ago

QCPIKAN: Quantum-Classical Physics-Informed KAN for PDEs

QCPIKAN is the first quantum-classical physics-informed Kolmogorov-Arnold network designed to solve partial differential equations. It uses Chebyshev-polynomial KAN layers and parameterized quantum circuits to embed physical constraints into training, achieving exponential error convergence and reduced numerical dispersion. Validated on seepage scenarios in porous media, it outperforms existing quantum-classical neural networks in prediction accuracy, error control, and dynamic tracking.

arxiv arXiv cs.LG · 6d ago

Quantum Ring All-Reduce: Communication and Privacy Advantages for Distributed Learning

A quantum version of ring all-reduce reduces per-link communication by a factor of two using entanglement and superdense coding, without altering model or gradient computations. It achieves information-theoretically secure aggregation via verified entanglement, with a 2x overhead in GHZ copies, and provides exponential communication advantages in gradient conflict detection for specific auditing tasks.

arxiv arXiv cs.LG · 6d ago

Variance Reduction in Temporal Difference Learning

Temporal difference learning reduces variance by aggregating over multiple trajectories. The study shows TD variance is asymptotically bounded above by Monte Carlo estimators, and shorter horizon updates reduce variance for fixed samples. Direct Advantage Estimation acts as a regression-adjusted control variate, achieving tighter variance bounds than TD in large samples.

arxiv arXiv cs.CL · 6d ago

Sequential DPO Shows Variable Preference Impact Across Settings

A study of sequential Direct Preference Optimization finds that later training does not uniformly degrade earlier learned preferences. The effect varies by objective relationship, signal strength, and training order, ranging from partial degradation to positive transfer. Pair-level analysis reveals heterogeneous changes, with high-confidence preference pairs sometimes improving despite aggregate metric stability.

arxiv arXiv cs.CL · 6d ago

Bayesian Curriculum Learning on LLM Latent Manifolds

Manifold Bandits introduces Bayesian Manifold Curriculum (BMC), a framework that models problem sampling as a structured bandit problem in LLMs' latent space. BMC organizes tasks into a hierarchical tree and uses Bayesian learning to guide sampling, revealing tradeoffs between learning signal, task diversity, and evaluation relevance. Prioritizing difficulty alone fails to achieve strong downstream performance, underscoring the need for structure and type-aware sampling.

arxiv arXiv cs.CL · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to learn 'Connect the Dots' by using reinforcement learning with long rollout sequences. The method includes tailored tasks and environments to foster meta-capability development, showing strong cross-domain generalization and performance in out-of-distribution settings. Implementations are available at https://github.com/agentscope-ai/Trinity-RFT/tree/research/cod/examples/research_cod.