Training methods
arxiv arXiv cs.LG · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to develop 'Connect the Dots' capability, allowing long-lifecycle agents to learn from experiences and iteratively update their environment context. The framework uses reinforcement learning with long rollout sequences and custom tasks to promote cross-domain generalization, showing effective out-of-distribution performance in both domains and transition settings.

arxiv arXiv cs.LG · 6d ago

Information-Theoretic Analysis of Effective Supervision in Latent Chain-of-Thought

This paper identifies a dual collapse in latent reasoning: gradient attenuation and representational drift. It proposes Trajectory and Space Supervision, showing that generative reconstruction preserves information capacity better than geometric compression. The Unified Latent Probe measures mutual information between latent trajectories and reasoning steps, revealing an information-performance binding in reasoning accuracy.

arxiv arXiv cs.LG · 6d ago

Recurrent neural networks approximate continuous functions

A single ReLU recurrent neural network with fixed weights and hidden dimension can uniformly approximate any continuous function on [-1,1] as its runtime increases. This is achieved via a new model, the Turing machine with neural units (TMNU), which balances algorithmic flexibility with bounded simulation by RNNs. The convergence rates match polynomial approximation rates, and minimax lower bounds confirm that runtime is an essential, unavoidable resource.

arxiv arXiv cs.LG · 6d ago

QCPIKAN: Quantum-Classical Physics-Informed KAN for PDEs

QCPIKAN is the first quantum-classical physics-informed Kolmogorov-Arnold network designed to solve partial differential equations. It uses Chebyshev-polynomial KAN layers and parameterized quantum circuits to embed physical constraints into training, achieving exponential error convergence and reduced numerical dispersion. Validated on seepage scenarios in porous media, it outperforms existing quantum-classical neural networks in prediction accuracy, error control, and dynamic tracking.

arxiv arXiv cs.LG · 6d ago

Quantum Ring All-Reduce: Communication and Privacy Advantages for Distributed Learning

A quantum version of ring all-reduce reduces per-link communication by a factor of two using entanglement and superdense coding, without altering model or gradient computations. It achieves information-theoretically secure aggregation via verified entanglement, with a 2x overhead in GHZ copies, and provides exponential communication advantages in gradient conflict detection for specific auditing tasks.

arxiv arXiv cs.CL · 6d ago

Sequential DPO Shows Variable Preference Impact Across Settings

A study of sequential Direct Preference Optimization finds that later training does not uniformly degrade earlier learned preferences. The effect varies by objective relationship, signal strength, and training order, ranging from partial degradation to positive transfer. Pair-level analysis reveals heterogeneous changes, with high-confidence preference pairs sometimes improving despite aggregate metric stability.

arxiv arXiv cs.CL · 6d ago

Bayesian Curriculum Learning on LLM Latent Manifolds

Manifold Bandits introduces Bayesian Manifold Curriculum (BMC), a framework that models problem sampling as a structured bandit problem in LLMs' latent space. BMC organizes tasks into a hierarchical tree and uses Bayesian learning to guide sampling, revealing tradeoffs between learning signal, task diversity, and evaluation relevance. Prioritizing difficulty alone fails to achieve strong downstream performance, underscoring the need for structure and type-aware sampling.

arxiv arXiv cs.CL · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to learn 'Connect the Dots' by using reinforcement learning with long rollout sequences. The method includes tailored tasks and environments to foster meta-capability development, showing strong cross-domain generalization and performance in out-of-distribution settings. Implementations are available at https://github.com/agentscope-ai/Trinity-RFT/tree/research/cod/examples/research_cod.