Training methods — korshunov.ai

Training methods Page 1 / 13

Learnable Global Merging for Variable-Length Tokenization in Diffusion Transformers

A novel variable-length tokenizer uses learnable global merging to enable cross-length representation alignment in diffusion models. This data-independent approach overcomes position-dependent semantics and improves the quality-compute trade-off on ImageNet 256×25-6 generation compared to prior methods.

arxiv arXiv cs.AI · 6d ago

Residual-Space Evolutionary Optimization via Flow-based Generative Models

A model-agnostic framework combines flow-based generative editing with evolutionary algorithms to enable data editing in non-differentiable settings. It operates in residual space, using self-pollination for local refinement and cross-pollination for broad exploration, validated on MorphoMNIST and crystal data to balance target alignment, instance preservation, and diversity.

arxiv arXiv cs.AI · 6d ago

Attention-Based SAC for Porosity Prediction in Additive Manufacturing

A multi-head attention feature extractor integrated with Soft Actor-Critic improves porosity prediction and process parameter optimization in laser powder bed fusion. The method achieves a convergence value of 322.79 in 14 episodes, outperforming DQN, PPO, TD3, and vanilla SAC with faster convergence and greater stability.

arxiv arXiv cs.AI · 6d ago

Hybrid Diffusion Transformer for Instruction-Guided Audio Editing

A hybrid two-stage diffusion transformer architecture enables efficient and accurate instruction-guided audio editing. It uses coarse-to-fine semantic alignment via joint attention at low resolution, followed by refined editing with alternating joint and cross-attention at high resolution. The method achieves better performance on complex editing tasks with improved efficiency and a compact model.

arxiv arXiv cs.AI · 6d ago

Sensorimotor World Models for Action-Aligned Perception

A new sensorimotor world model (SMWM) learns compact, action-relevant latent representations from offline trajectories. It uses inverse dynamics regularization to prevent representation collapse and align latent states with controllable environmental degrees of freedom, enabling stable training without complex regularizers or frozen components. SMWM achieves competitive planning performance in 2D and 3D control tasks.

arxiv arXiv cs.AI · 6d ago

Frequency-Aware Flow Matching for Robotic Action Generation

Frequency-Aware Flow Matching (FAFM) enables continuous and temporally consistent robotic action generation by transforming discrete action sequences into the frequency domain using discrete cosine transform. It regularizes first-order temporal derivatives with a Sobolev-type constraint to ensure smooth actions, improving success rates, motion smoothness, and robustness across synthetic and real-world tasks without adding network parameters.

arxiv arXiv cs.AI · 6d ago

RACL: Reasoning-Agent Control Layer for Metaheuristic Learning

RACL introduces a reasoning agent that controls metaheuristic search behavior without replacing optimizers or altering constraints. It improves or ties key policies in vehicle routing experiments, reducing average cost by 8.337% versus Fixed and 1.605% versus Stagnation-Triggered policies, with no significant computational overhead.

arxiv arXiv cs.AI · 6d ago

Hybrid ANN-SNN Pipeline with Local Plasticity

A hybrid ANN-SNN pipeline uses pretrained EfficientNet encoders and converts their activations to spike trains via rate-coding. The system trains a CoLaNET spiking classifier with local plasticity rules, achieving 99.09% accuracy on ImageNet's 64-class benchmark, matching conventional deep networks.

arxiv arXiv cs.AI · 6d ago

Modular-Sparsity Synchronization for PINN Training

ModSync addresses capacity-induced failure in PINNs by preventing functional modularity and self-partitioning of overparameterized networks. It enhances cross-objective interaction through structural optimization that penalizes task-exclusive connections while preserving interaction-promoting pathways.

arxiv arXiv cs.AI · 6d ago

Boundary Embedding Shaping for Graph Structural Disentanglement

Boundary Embedding Shaping (BES) addresses graph structural entanglement by selectively suppressing spurious neighbor correlations near class boundaries. BES uses adaptive contrastive learning to enhance boundary discrimination, improving GCN node classification by an average of 3.3% (up to 5.0% on WikiCS) and achieving superior link prediction accuracy.

arxiv arXiv cs.LG · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to develop 'Connect the Dots' capability, allowing long-lifecycle agents to learn from experiences and iteratively update their environment context. The framework uses reinforcement learning with long rollout sequences and custom tasks to promote cross-domain generalization, showing effective out-of-distribution performance in both domains and transition settings.

arxiv arXiv cs.LG · 6d ago

StreamKL: Fast and Memory-Efficient KL Divergence for Attention Distillation

StreamKL introduces a fused GPU primitive that eliminates quadratic memory usage in attention distillation by streaming query-key tiles through on-chip SRAM. It achieves up to 43x speedup in forward and 14x in backward passes, reducing extra HBM footprint from O(N_QN_K) to O(1), enabling long-context distillation on a single GPU.

arxiv arXiv cs.LG · 6d ago

VIMPO: Critic-Free Policy Optimization for LLMs

VIMPO introduces a critic-free policy optimization method that derives a policy-implied value function from KL-regularized reinforcement learning. It enables verifiable reward incorporation without training a critic and outperforms GRPO on mathematical benchmarks, especially under noisy rewards.

arxiv arXiv cs.LG · 6d ago

LLM-based Hierarchical Control in Multi-Agent Games

A hierarchical system using a pretrained LLM to select RL skill policies outperforms flat RL in a 2v2 King of the Hill environment. It matches hand-crafted behavior tree performance in win rate and is perceived as more human-like by 60% of users, highlighting effective coordination and adaptability without manual rule design.

arxiv arXiv cs.LG · 6d ago

Information-Theoretic Analysis of Effective Supervision in Latent Chain-of-Thought

This paper identifies a dual collapse in latent reasoning: gradient attenuation and representational drift. It proposes Trajectory and Space Supervision, showing that generative reconstruction preserves information capacity better than geometric compression. The Unified Latent Probe measures mutual information between latent trajectories and reasoning steps, revealing an information-performance binding in reasoning accuracy.

arxiv arXiv cs.LG · 6d ago

Sensorimotor World Models for Action-Aligned Perception

A sensorimotor world model (SMWM) is introduced that learns compact, action-aligned latent representations from offline trajectories. It uses inverse dynamics regularization to prevent representation collapse and enable stable, interpretable world models without requiring frozen encoders or complex regularizers. SMWM achieves competitive planning performance in 2D and 3D control tasks.

arxiv arXiv cs.LG · 6d ago

Quantile of Means: Ensemble Method for Minimax Optimal RL

A new ensemble method for finite-horizon MDPs uses quantile-based estimates to achieve minimax optimal regret bounds. It eliminates reliance on count-based uncertainty and provides theoretical justification for ensemble-based exploration in reinforcement learning.

arxiv arXiv cs.LG · 6d ago

Off-Policy Evaluation for MNAR Rewards in MDPs

We propose an off-policy evaluation method for finite-horizon MDPs with rewards missing not at random. Our approach uses a reward-dependent propensity model and a bridge function to recover conditional mean rewards without modeling the MNAR mechanism, achieving consistency and finite-sample error bounds. Experiments on simulated and MIMIC-III Sepsis data show superior performance over existing methods.

arxiv arXiv cs.LG · 6d ago

Boundary Embedding Shaping for Graph Structural Disentanglement

arxiv arXiv cs.LG · 6d ago

SLiR: Shifting-based Linear Relaxations for Activation Functions

SLiR enables sound, tight linear relaxations of general activation functions using only Lipschitz constants or critical points. It achieves up to 7.8x more verification properties than state-of-the-art methods by efficiently computing upper and lower bounds via a shifting procedure.