arxiv arXiv cs.AI · 7d ago · research

Rubric-Conditioned Self-Distillation Framework

from English

Rubric-Conditioned Self-Distillation introduces a framework that uses structured rubrics to provide fine-grained, token-level feedback during self-distillation of reasoning language models. By conditioning teacher models on rubric-level criteria, it enables more precise credit assignment than scalar rewards, outperforming GRPO and OPSD by 1.0 and 0.9 points on average across science reasoning benchmarks.

Importance 3/3 Beats a top-lab benchmark New feature vs. leaders arXiv cs.AI Allen AI Microsoft Research OpenAI Evaluation & benchmarks Reasoning models Training methods

Benchmarks

Benchmark	Model	Score
GSM8K	rubric-conditioned self-distillation	1pts
MATH-500	rubric-conditioned self-distillation	0.9pts

Read original