DeepRubric introduces a data construction framework that builds query-rubric pairs by first defining verifiable evaluation targets through an evidence tree. It generates 9K supervision examples and trains a 8B model with GRPO, achieving performance comparable to state-of-the-art models using 13x fewer RL GPU-hours.
arxiv
arXiv cs.CL
·
9d ago
·
research
DeepRubric: Efficient RL for Deep Research Agents
from English
Importance 3/3
Beats a top-lab benchmark
New feature vs. leaders
New harness with differentiators
arXiv cs.CL
OpenAI
Google DeepMind
Mistral AI
AI agents
Reasoning models
Training methods
Benchmarks
| Benchmark | Model | Score |
|---|---|---|
| Multi-SWE-bench | DeepRubric-8B | — |
| SWE-bench | DeepRubric-8B | — |
| SWE-bench Verified | DeepRubric-8B | — |