Results
Sort
Reset
arxiv arXiv cs.AI · 8d ago

MAST Enables Selective Unlearning in RLVR-Induced Reasoning

MAST, a mechanism-guided unlearning method, achieves targeted forgetting of RLVR-induced reasoning with minimal collateral damage. On Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, it significantly reduces MATH performance (45/150 to 37/15-0) while preserving GSM8K accuracy by +0.8 points and maintaining MATH retention at -0.5 points. Results hold across seeds, objectives, and models, showing superior stability over full-parameter unlearning.

arxiv arXiv cs.AI · 8d ago

ScenA: Reference-Driven Multi-Speaker Audio Scene Generation

ScenA conditions a text-to-audio foundation model on multiple reference voices and a natural language scene prompt to generate realistic multi-speaker conversations. It addresses the 'Reference Shortcut' issue by using a high-noise-biased training schedule, ensuring speaker assignment relies on text prompts rather than acoustic similarity. Evaluated on CoVoMix2-Dialogue, Scen- A outperforms existing systems in speaker-binding and produces rich, naturalistic audio with overlapping speech and ambient noise.

arxiv arXiv cs.CL · 8d ago

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

PragReST is a self-supervised framework that enhances large language models' pragmatic reasoning by generating counterfactual reasoning traces and training via supervised fine-tuning and reinforcement learning. It outperforms baseline models on four pragmatic benchmarks, improving Qwen3-8B and Qwen3-14B by 5.37% and 5-5.50% accuracy respectively, and maintains strong performance on general-knowledge and mathematical reasoning tasks.