Reasoning models
arxiv arXiv cs.CL · 8d ago

GraphPO: Graph-based Policy Optimization for Reasoning Models

GraphPO introduces a directed acyclic graph framework to represent reasoning rollouts, merging semantically equivalent paths to reduce redundant exploration. It assigns efficiency and correctness advantages to edges, improving inference efficiency and process supervision while reducing advantage-estimation variance. Experiments show GraphPO outperforms chain- and tree-based methods on three LLMs across reasoning and agentic search tasks under identical token or response budgets.

arxiv arXiv cs.CL · 8d ago

Index Sickness Elimination via Baseline-Log Physical Separation

In a 391-session AI collaboration project, LLMs exhibited 'Index Sickness'—a failure where symbolic complexity leads to self-referential outputs disconnected from reality. The 'Pang Principle' asserts natural language conveys superior semantic quality over symbolic systems, and the 'Baseline-Log Physical Separation' mechanism reduced AI instruction volume by 75% and eliminated recurrence of Index Sickness in subsequent sessions.

arxiv arXiv cs.CL · 8d ago

Human-AI Coevolution Framework Reveals Social Intelligence Emergence

The Human-AI Coevolution Dynamics Framework (HACD-H) introduces a unified model for long-term human-AI interaction, integrating emotional adaptation, memory, and personality into a self-organizing social cognitive system. Results show social intelligence emerges through coevolution, with a significant negative correlation between social intelligence and social cognitive energy (r = -0.391, p < 0.001), and progressive energy reduction over time in interaction trajectories.

arxiv arXiv cs.AI · 8d ago

Data Recipe Boosts Long-Context Reasoning in LLMs

A data-centric approach improves long-context reasoning in large language models, using eight curated datasets with 14K examples across retrieval, multi-evidence synthesis, and reasoning tasks. When paired with minimal outcome-based GRPO training, it achieves average gains of +7.2 to +6.4 points on seven benchmarks, outperforming prior RL training sets, and enhances agentic performance by +4.8 and +7.0 points on GAIA and BrowseComp respectively.