Training methods
arxiv arXiv cs.AI · 6d ago

Sensorimotor World Models for Action-Aligned Perception

A new sensorimotor world model (SMWM) learns compact, action-relevant latent representations from offline trajectories. It uses inverse dynamics regularization to prevent representation collapse and align latent states with controllable environmental degrees of freedom, enabling stable training without complex regularizers or frozen components. SMWM achieves competitive planning performance in 2D and 3D control tasks.

arxiv arXiv cs.AI · 6d ago

Frequency-Aware Flow Matching for Robotic Action Generation

Frequency-Aware Flow Matching (FAFM) enables continuous and temporally consistent robotic action generation by transforming discrete action sequences into the frequency domain using discrete cosine transform. It regularizes first-order temporal derivatives with a Sobolev-type constraint to ensure smooth actions, improving success rates, motion smoothness, and robustness across synthetic and real-world tasks without adding network parameters.

arxiv arXiv cs.LG · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to develop 'Connect the Dots' capability, allowing long-lifecycle agents to learn from experiences and iteratively update their environment context. The framework uses reinforcement learning with long rollout sequences and custom tasks to promote cross-domain generalization, showing effective out-of-distribution performance in both domains and transition settings.

arxiv arXiv cs.LG · 6d ago

Information-Theoretic Analysis of Effective Supervision in Latent Chain-of-Thought

This paper identifies a dual collapse in latent reasoning: gradient attenuation and representational drift. It proposes Trajectory and Space Supervision, showing that generative reconstruction preserves information capacity better than geometric compression. The Unified Latent Probe measures mutual information between latent trajectories and reasoning steps, revealing an information-performance binding in reasoning accuracy.