arxiv arXiv cs.CL · 9d ago · research

DeepRubric: Efficient RL for Deep Research Agents

from English

DeepRubric introduces a data construction framework that builds query-rubric pairs by first defining verifiable evaluation targets through an evidence tree. It generates 9K supervision examples and trains a 8B model with GRPO, achieving performance comparable to state-of-the-art models using 13x fewer RL GPU-hours.

Importance 3/3 Beats a top-lab benchmark New feature vs. leaders New harness with differentiators arXiv cs.CL OpenAI Google DeepMind Mistral AI AI agents Reasoning models Training methods

Benchmarks

Benchmark	Model	Score
Multi-SWE-bench	DeepRubric-8B	—
SWE-bench	DeepRubric-8B	—
SWE-bench Verified	DeepRubric-8B	—

Read original