Training methods
arxiv arXiv cs.LG · 9d ago

Discriminator-Guided RL Corrects Flow Matching with Data-Aligned Rewards

Discriminator-Guided RL (DRL) uses a pretrained representation space to train a discriminator that separates real data from model-generated samples. Its logit is used as a reward in KL-regularized RL, aligning model outputs with visual and semantic realism without human preferences. DRL improves FID and semantic FD across models like SiT and JiT, and enhances the Pareto frontier between preference and fidelity.

arxiv arXiv cs.LG · 10d ago

MAST Enables Selective Unlearning in RLVR-Induced Reasoning

MAST, a mechanism-guided unlearning method, achieves targeted forgetting of RLVR-induced reasoning with minimal collateral damage. On Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, it significantly reduces MATH performance (45/150 to 37/15-0) while preserving GSM8K accuracy by +0.8 points and maintaining MATH retention at -0.5 points. Results hold across different seeds, objectives, and models, showing superior stability over full-parameter unlearning.

arxiv arXiv cs.LG · 10d ago

NeSyCat Torch: Differentiable Tensor Implementation for Neurosymbolic Learning

NeSyCat Torch provides a differentiable tensor implementation of categorical semantics for neurosymbolic learning, unifying classical, fuzzy, probabilistic, and neural systems under a single inductive truth definition. It outperforms LTN and DeepProbLog in speed and accuracy on MNIST addition, matching DeepStochLog's accuracy while operating within a uniform framework extendable to continuous probability via monad instantiation.

arxiv arXiv cs.LG · 10d ago

Diffusion-Proof: First Framework for Diffusion LLMs in Formal Theorem Proving

Diffusion-Proof is the first framework to train and apply diffusion language models for formal theorem proving. It introduces dLLM-Prover-7B for whole-proof writing with long-range coherence and dLLM-Corrector-7- for local proof correction using bidirectional information. The framework outperforms auto-regressive LLM baselines by 1.61% on ProofNet-Test and 6.14% on MiniF2F-Test, and solves an IMO problem beyond the capability of DeepSeek-Prover-V2-7B.

arxiv arXiv cs.AI · 10d ago

MAST Enables Selective Unlearning in RLVR-Induced Reasoning

MAST, a mechanism-guided unlearning method, achieves targeted forgetting of RLVR-induced reasoning with minimal collateral damage. On Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, it significantly reduces MATH performance (45/150 to 37/15-0) while preserving GSM8K accuracy by +0.8 points and maintaining MATH retention at -0.5 points. Results hold across seeds, objectives, and models, showing superior stability over full-parameter unlearning.

arxiv arXiv cs.AI · 10d ago

NeSyCat Torch: Differentiable Tensor Implementation for Neurosymbolic Learning

NeSyCat Torch provides a differentiable tensor implementation of categorical semantics for neurosymbolic learning, unifying classical, fuzzy, probabilistic, and neural systems under a single inductive truth definition. It outperforms LTN and DeepProbLog in speed and accuracy on MNIST addition, matching DeepStochLog's accuracy while operating within a uniform framework extensible to continuous probability via monad instantiation.