arxiv arXiv cs.LG · 7d ago · src: 8d ago · research

TAPO: Self-Distillation with Micro-Reflective Trajectories

from English

TAPO advances self-distillation by constructing explicit micro-reflective trajectories that retain erroneous reasoning and insert natural-language diagnoses. These trajectories, derived from correct and incorrect model rollouts, provide fine-grained error corrections anchored in the model's own reasoning, improving both first-pass reasoning and error correction compared to GRPO.

Importance 3/3 New feature vs. leaders New harness with differentiators arXiv cs.LG OpenAI Google DeepMind Meta AI Evaluation & benchmarks Reasoning models Training methods

Benchmarks

Benchmark	Model	Score
AIME 2024	TAPO	—
AIME 2025	TAPO	—

Read original