A data-centric approach improves long-context reasoning in large language models, using eight curated datasets with 14K examples across retrieval, multi-evidence synthesis, and reasoning tasks. When paired with minimal outcome-based GRPO training, it achieves average gains of +7.2 to +6.4 points on seven benchmarks, outperforming prior RL training sets, and enhances agentic performance by +4.8 and +7.0 points on GAIA and BrowseComp respectively.
arxiv
arXiv cs.AI
·
7d ago
·
research
Data Recipe Boosts Long-Context Reasoning in LLMs
from English
Importance 3/3
Beats a top-lab benchmark
arXiv cs.AI
Alibaba (Qwen)
AI agents
Reasoning models
Training data
Benchmarks
| Benchmark | Model | Score |
|---|---|---|
| SWE-bench | Qwen3-4B | 7.2pts |
| BrowseComp | Qwen3-4B | 7pts |
| SWE-bench | Qwen3-30B-A3B | 6.4pts |
| GAIA | Qwen3-4B | 4.8pts |
| SWE-bench | Qwen3-8B | 3.2pts |