Training methods — korshunov.ai

Training methods Page 1 / 14

Volterra Generative Models Introduce Fractional Noise for Score-Based Generation

Volterra generative models propose a continuous-time score-based framework using fractional kernels to inject path-dependent noise, avoiding memoryless noising in traditional diffusion models. The approach employs finite-dimensional Markovian lifts and demonstrates improved generation on MNIST and CIFAR-10, with a bridge sampler enhancing stability for larger models.

arxiv arXiv cs.AI · 8d ago

S4oP: Operator-level Pruning for Efficient SSM Deployment

S4oP introduces an incremental, operator-level pruning method for S4 and S4D models, reducing inference cost by up to 70% while maintaining performance. The approach combines structured masking with fine-tuning and jointly tracks accuracy and latency, enabling efficient deployment of SSMs on resource-constrained devices.

arxiv arXiv cs.AI · 8d ago

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

The paper introduces a framework for multi-policy multi-objective reinforcement learning that learns a set of Pareto-optimal policies ensuring fairness across diverse user preferences. It proves fair policies remain within the convex coverage set for concave welfare functions like GGF and proposes three algorithms that incorporate non-stationary and stochastic policies to adapt to historical inequities. Empirical results show these methods effectively learn fair policies across multiple domains.

arxiv arXiv cs.AI · 8d ago

Ternary Mamba: Pretrained QAT for Efficient SSM Compression

Ternary Mamba achieves 3.61x compression of Mamba-2 using grouped quantization-aware training from a pretrained checkpoint, reducing memory from 2,687 to 744 MB. It reaches 48.1% zero-shot accuracy with only 102M tokens and 4 GPU-hours, matching Bi-Mamba within 0.9 percentage points, while revealing new instability from learnable quantization scales and error accumulation in recurrence.

arxiv arXiv cs.AI · 9d ago

Meta-Knowledge Reutilization in Reinforcement Learning

A new framework learns task-level knowledge on a simplified agent and transfers it to heterogeneous agents. It uses Bayesian non-parametric priors and a high-level policy to generate task guidance, with a semantic-magnitude interface and temporal adaptor to align meta-knowledge with embodiment-specific controllers. Experiments show 94.75% to 99.79% reduction in final-step tracking error and comparable performance using 23.8% of the interaction data of state-of-the-art methods.

arxiv arXiv cs.AI · 9d ago

Kolmogorov Regression for Robust Diffusion Policies

A backward Kolmogorov equation lifts diffusion policies to a Cameron-Martin space, replacing stochastic score matching with a deterministic PDE. This approach achieves convergence bounds tied to kernel effective rank, improved trajectory regularity, and a failure detector without rewards, showing 17% higher reward and 67.6% reduced drift on PushT, and 28.4% lower RMSE with perfect bottleneck detection on a manufacturing line. Hamilton-Jacobi theory reduces deadlock events by 96% in simulations.

arxiv arXiv cs.AI · 9d ago

FPRM: Fixed-Point Reasoning Model with Adaptive Compute

FPRM is a Transformer-based model that uses fixed-point convergence as an end-to-end halting mechanism in a looped architecture. It adapts compute to task difficulty by leveraging fixed-point reasoning, outperforming baseline models on Sudoku, Maze, state-tracking, and ARC-AGI benchmarks.

arxiv arXiv cs.AI · 9d ago

Looped World Models Achieve 100x Parameter Efficiency

Looped World Models (LoopWM) introduce a looped architecture that iteratively refines latent environment states using a parameter-shared transformer. This approach achieves up to 100x parameter efficiency over conventional world models by adapting computation depth to each prediction's complexity.

arxiv arXiv cs.CL · 9d ago

Negative Token Filtering for Stable Single-Rollout RL

A new approach called negative token filtering enables stable single-rollout training by preventing false penalties on negative samples. The method improves performance on agentic tasks compared to group-based RL techniques, while matching group-based methods on reasoning tasks.

arxiv arXiv cs.CL · 9d ago

Expressivity Analysis of Hierarchical Modelling in Deep Transformers

This paper analyzes deep transformer expressiveness using bounded-depth grammars. It constructs transformers with positional attention where model depth scales linearly with grammar depth, and neuron count grows quadratically with production rules. The results support the linear representation hypothesis by showing these models can encode abstract grammatical states in low-dimensional, linearly separable subspaces.

arxiv arXiv cs.CL · 9d ago

NAR-MBR Decoding for Fast and Accurate Speech Recognition

NAR-MBR decoding improves speech recognition by maximizing expected utility from sampled outputs of non-autoregressive models. It achieves better performance than prior NAR methods and runs faster than autoregressive decoding across multiple corpora.

arxiv arXiv cs.CL · 9d ago

EnvRL: Leveraging Environment Dynamics in Agentic RL

EnvRL introduces a framework that enhances agentic reinforcement learning by incorporating environment dynamics through state prediction and inverse dynamics objectives. It achieves significant gains in success rates on long-horizon benchmarks, improving Qwen-2.5-1.5B-Instruct performance from 72.8% to 77.4% on ALFWorld and from 56.8% to 67.0% on WebShop when trained with GRPO.

arxiv arXiv cs.CL · 9d ago

LLM-Designed Training Environment for RL with Multi-Agent Reasoning

The LLM-as-Environment-Engineer framework uses LLMs to automatically redesign training environments in reinforcement learning by analyzing failure trajectories and contextual data. On the MAPF-FrozenLake testbed, it outperforms larger proprietary LLMs and fixed-environment baselines, with Qwen3-4B achieving the strongest aggregate performance. Analysis shows that failure evidence and preserved working configurations are key, and the current RL checkpoint performs better than the base model as an environment engineer.

arxiv arXiv cs.CL · 9d ago

SuCo: Sufficiency-guided Continuous Adaptive Reasoning

SuCo introduces Minimal Sufficient CoT (MSC) as the shortest reasoning prefix adequate for correct answers. It employs a two-stage training framework—MSC-Aligned Fine-Tuning and Sufficiency-Aware Policy Optimization—to reduce reasoning length while maintaining or improving accuracy across math, code, and science tasks.

arxiv arXiv cs.CL · 9d ago

Dynamic Rollout Editing Reduces Overthinking in RL-Trained Reasoning Models

Dynamic Rollout Editing (DRE) addresses overthinking in RL-trained reasoning models by modifying successful trajectories post-answer emergence. DRE preserves the correct reasoning prefix while editing unnecessary continuation, weakening the credit assigned to redundant thinking without penalizing valid reasoning. Experiments across diverse tasks demonstrate its effectiveness in reducing overthinking.

media r/LocalLLaMA · 9d ago

Community model build thread: Crowdsourced training feasible

A community model can be built through crowdsourced compute using a 'Branch-Train-Stitch' approach. Participants train a prototype model on their hardware, submit narrow-domain submodels, and organizers stitch them into a large Mixture-of-Experts (MoE) model, with key decisions including prototype size, scope definitions, and training protocols.

media Interconnects · 9d ago

Frontier Post-Training Recipe Review with Finbarr Timbers

The podcast reviews the evolution of post-training recipes in large language models, from InstructGPT to 2026 frontier models. It highlights Multi-Teacher On-Policy Distillation (MOPD) as the dominant pattern, where domain-specialist models are trained and then distilled into a general student model via on-policy distillation, scaling to over 10 teachers in models like DeepSeek V4 and Nemotron 3 Ultra.

media r/LocalLLaMA · 9d ago

Pooling GPUs to train a community model

A Reddit user asks whether anyone is successfully pooling GPUs to train a community model, highlighting challenges like latency and weight poisoning. The post questions if current distributed volunteer computing projects have achieved successful community model training.

arxiv arXiv cs.CL · 10d ago

Contrastive-Difference CKA Reveals Concept-Specific Alignment Across LLM Architectures

A training-free diagnostic, contrastive-difference CKA (CKA_Delta), identifies concept-specific structural alignment across language model architectures. It detects geometric convergence and functional transfer across six concept domains, including non-instructional tasks, with significant discrimination where standard CKA fails. Results suggest universality may strengthen with model scale, though further validation is needed.

arxiv arXiv cs.CL · 10d ago

Key Properties for Effective Code Interpreter Reasoning

A study identifies extrinsic (crucial tokens) and intrinsic (cognitive behaviors) properties that enhance code interpreter reasoning in large language models. Stronger reasoning models show higher prevalence of verification, backtracking, and backward chaining, with these properties improving performance during inference and training, reducing overthinking and boosting token efficiency.