Reasoning models
arxiv arXiv cs.CL · just now Live

OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning

The OPERA framework addresses the instability of applying reinforcement learning to open-ended tasks by replacing external judge models with intrinsic rewards derived from perplexity dynamics. This approach quantifies uncertainty reduction at critical reflective states, eliminating stylistic biases and positional inconsistencies common in LLM-as-a-judge systems. During the cold-start phase, the method utilizes guiding words to synthesize diverse reasoning traces and employs perplexity-prioritized rollouts to identify logically consistent branches. This pipeline generates a large-scale dataset of 20,000 high-quality reasoning trajectories for training. Implementing OPERA on the Qwen3-8B model establishes a new state-of-the-art among open-source models. The system achieves parity with or surpasses proprietary models like Gemini2.5 and MiniMax-M2.5 in specific open-ended tasks. Empirical evaluations confirm the scalability and efficacy of this objective perplexity-based alignment strategy.

arxiv arXiv cs.LG · 16h ago

TASER: Task-Differentiated Skill Expansion for Heterogeneous Continual Learning

TASER introduces a framework that dynamically expands and routes atomic skills for continual learning across highly heterogeneous tasks. It reduces catastrophic forgetting and improves plasticity by ensuring semantic distinctness and efficient capacity allocation through skill detection and routing mechanisms. Evaluated on HeteroCLBench, a benchmark with 19 diverse tasks across 9 cognitive dimensions, TASER outperforms existing baselines.

arxiv arXiv cs.LG · 18h ago

Atomistic Language Models Understand and Generate Materials

Atomistic Language Models (ALMs) unify language and atomistic structures, enabling natural language-driven crystal generation and optimization. ALMs use a continuous bridge to map language embeddings into atomistic diffusion steering space and employ Text-to-Crystal Feynman-Kac for stoichiometric accuracy. The ALM Bench benchmark evaluates text-conditioned material generation and optimization, with code and weights to be released soon.

arxiv arXiv cs.LG · 18h ago

BIPC Framework Accelerates Mixed-Integer Optimization with Machine Learning

The BIPC framework reduces solution time for large-scale mixed-integer programs by identifying a backdoor subset of variables that drive computational complexity. Using supervised learning, it predicts backdoor variable values and intervals, then solves a reduced problem with these predictions, achieving significant speedups with minimal quality loss. This enables rapid, high-quality solutions under parameter perturbations in real-world systems like power and supply chains.