Reasoning models
arxiv arXiv cs.CL · 2d ago

Group-Graph Policy Optimization for Long-Horizon Agentic RL

Group-Graph Policy Optimization (G2PO) introduces a graph-based approach to enhance long-horizon agentic reinforcement learning by transforming interaction trajectories into state-transition graphs. It enables group-aggregated state-value estimation and edge-centric advantage calculation, improving credit assignment and reducing variance, and achieves up to 22.2% success rate improvement over GRPO on WebShop, ALFWorld, and AppWorld benchmarks.

arxiv arXiv cs.CL · 2d ago

Dual-Track Framework for Template-Constrained LaTeX Conversion

A new Dual-Track Framework decouples template formatting from document processing by using an offline track to extract template constraints into a reusable manifest and an online track with a hybrid pipeline. It limits LLM use to reasoning tasks like metadata and bibliographic handling, while applying rule-based engines for deterministic operations, improving structural fidelity, layout compliance, and compilation success over baseline methods.

arxiv arXiv cs.CL · 2d ago

Language shapes historical credit in large language models

A study of 11 large language models across 21 disputed inventions shows that query language systematically influences which inventor is credited. Lower-status claimants appear more frequently when questions are phrased in their native language, while dominant Anglophone figures remain consistent. The findings suggest language acts as a switch that activates distinct national versions of history, indicating that LLMs function as systems of cultural memory.