Training methods — korshunov.ai

Training methods Page 8 / 14

Discriminator-Guided RL Corrects Flow Matching with Data-Aligned Rewards

Discriminator-Guided RL (DRL) uses a pretrained representation space to train a discriminator that separates real data from model-generated samples. Its logit is used as a reward in KL-regularized RL, aligning model outputs with visual and semantic realism without human preferences. DRL improves FID and semantic FD across models like SiT and JiT, and enhances the Pareto frontier between preference and fidelity.

arxiv arXiv cs.LG · 9d ago

Essential Subspace Merging for Multi-Task Learning

Essential Subspace Merging (ESM) reduces inter-task interference by focusing on principal directions of activation shifts. ESM++ extends this with dynamic expert selection via prototype-based routing, enabling efficient, training-free multi-task model merging.

arxiv arXiv cs.LG · 9d ago

Safety Reflection Pretraining for LLMs

Safety Reflection Pretraining inserts short safety reflections into pretraining data to enable self-monitoring in language models. Experiments with 1.7B models on FineWeb-Edu show improved safety accuracy and reduced attack success rates, with MedSafetyWorld demonstrating that the method better prevents unsafe behaviors from being generalized from safe data than data filtering or rewriting.

arxiv arXiv cs.LG · 9d ago

Batch Size Tradeoffs in Stochastic Momentum Methods

Stochastic momentum methods like HB and ASGD show distinct batch-size tradeoffs in compute efficiency and serial runtime. HB maintains SGD-level compute efficiency over a batch-size window up to a factor \sqrt{\kappa} larger than SGD's critical batch size, while ASGD improves small-batch efficiency for rapidly decaying spectra but sacrifices it for larger batches in exchange for reduced serial runtime.

arxiv arXiv cs.LG · 9d ago

AGDN: Solving Traveling Salesman Problem with Anisotropic Graph Diffusion

AGDN introduces a graph neural network framework that addresses topological priors and connectivity loss in TSP. It uses a MixScore transition matrix and anisotropic diffusion to enable efficient information exchange, outperforming existing methods across diverse problem sizes and distributions while maintaining competitive computation time. The implementation is available on GitHub.

arxiv arXiv cs.LG · 10d ago

Decision-Focused RL for EV Charging with Unknown Departure Times

A new decision-focused RL framework jointly trains a forecaster and charging controller to handle unknown EV departure times. By aligning forecast accuracy with downstream decision quality, the method achieves up to 14% higher total reward and a 55% reduction in unsupplied energy compared to standard RL approaches.

arxiv arXiv cs.LG · 10d ago

MAST Enables Selective Unlearning in RLVR-Induced Reasoning

MAST, a mechanism-guided unlearning method, achieves targeted forgetting of RLVR-induced reasoning with minimal collateral damage. On Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, it significantly reduces MATH performance (45/150 to 37/15-0) while preserving GSM8K accuracy by +0.8 points and maintaining MATH retention at -0.5 points. Results hold across different seeds, objectives, and models, showing superior stability over full-parameter unlearning.

arxiv arXiv cs.LG · 10d ago

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

STARE addresses policy entropy collapse in GRPO-based reinforcement learning by identifying entropy-critical token subsets via surprisal quantiles and reweighting their advantages. It maintains stable policy entropy across model scales and tasks, outperforming DAPO and other baselines by 4%-8% on AIME24 and AIME25, with consistent exploration-exploitation balance.

arxiv arXiv cs.LG · 10d ago

Graph Neural Networks Accelerate Algebraic Multigrid Pressure Solver

A graph neural network enhances algebraic multigrid solvers by predicting optimal polynomial coefficients for sparse pseudo-inverse operators. The method reduces V-cycle iterations and achieves wall-clock speedups of 4% to 37% across benchmarks, with robust performance on meshes up to 128 times larger than training data and on unseen industry problems like AirfRANS.

arxiv arXiv cs.LG · 10d ago

Large Language Gibbs for Structured Probabilistic Inference

Large Language Gibbs uses LLM conditional distributions as transition operators for iterative variable resampling. This method enables coherent, order-independent probabilistic inference by achieving a stationary distribution that balances local conditionals, offering a practical alternative to single-pass generation for structured reasoning tasks.

arxiv arXiv cs.LG · 10d ago

NeSyCat Torch: Differentiable Tensor Implementation for Neurosymbolic Learning

NeSyCat Torch provides a differentiable tensor implementation of categorical semantics for neurosymbolic learning, unifying classical, fuzzy, probabilistic, and neural systems under a single inductive truth definition. It outperforms LTN and DeepProbLog in speed and accuracy on MNIST addition, matching DeepStochLog's accuracy while operating within a uniform framework extendable to continuous probability via monad instantiation.

arxiv arXiv cs.LG · 10d ago

P-K-GCN: Physics-augmented Koopman-enhanced Graph Convolutional Network

P-K-GCN enables high-fidelity spatiotemporal super-resolution on irregular geometries by combining graph convolutional networks with Koopman operator theory. It incorporates a physics-based loss to enforce adherence to physical laws, reducing super-resolution error through improved generalization and accuracy, as validated in cardiac electrodynamics reconstruction.

arxiv arXiv cs.LG · 10d ago

Diffusion-Proof: First Framework for Diffusion LLMs in Formal Theorem Proving

Diffusion-Proof is the first framework to train and apply diffusion language models for formal theorem proving. It introduces dLLM-Prover-7B for whole-proof writing with long-range coherence and dLLM-Corrector-7- for local proof correction using bidirectional information. The framework outperforms auto-regressive LLM baselines by 1.61% on ProofNet-Test and 6.14% on MiniF2F-Test, and solves an IMO problem beyond the capability of DeepSeek-Prover-V2-7B.

arxiv arXiv cs.LG · 10d ago

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based RL

UBP2 introduces a model-based method that actively explores environments by jointly reasoning over uncertainties in reward, dynamics, and value functions. It achieves superior sample efficiency in preference-based reinforcement learning, outperforming both model-free and non-optimistic model-based baselines on the Meta-World benchmark.

arxiv arXiv cs.AI · 10d ago

MAST Enables Selective Unlearning in RLVR-Induced Reasoning

MAST, a mechanism-guided unlearning method, achieves targeted forgetting of RLVR-induced reasoning with minimal collateral damage. On Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, it significantly reduces MATH performance (45/150 to 37/15-0) while preserving GSM8K accuracy by +0.8 points and maintaining MATH retention at -0.5 points. Results hold across seeds, objectives, and models, showing superior stability over full-parameter unlearning.

arxiv arXiv cs.AI · 10d ago

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

arxiv arXiv cs.AI · 10d ago

NeSyCat Torch: Differentiable Tensor Implementation for Neurosymbolic Learning

NeSyCat Torch provides a differentiable tensor implementation of categorical semantics for neurosymbolic learning, unifying classical, fuzzy, probabilistic, and neural systems under a single inductive truth definition. It outperforms LTN and DeepProbLog in speed and accuracy on MNIST addition, matching DeepStochLog's accuracy while operating within a uniform framework extensible to continuous probability via monad instantiation.