Benchmark · math

GSM8K

saturated 5 results 4 models
1 50.8 100.5 150.2 200 2026-06-18 SC-GRPO · 8.1 · 2026-06-18 DAPO · 5.9 · 2026-06-18 rubric-conditioned self-distillation · 1 · 2026-06-18 rubric-conditioned self-distillation · 1 · 2026-06-18 LLMs · 98 · 2026-06-19
SC-GRPO DAPO rubric-conditioned self-distillation LLMs
Timeline
  1. 2026-06-19 LLMs 98.0% GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs
  2. 2026-06-18 rubric-conditioned self-distillation 1.0pts Rubric-Conditioned Self-Distillation Framework
  3. 2026-06-18 rubric-conditioned self-distillation 1.0pts Rubric-Conditioned Self-Distillation Framework
  4. 2026-06-18 SC-GRPO 8.1% Self-Conditioned Credit Assignment for RL with Verifiable Rewards
  5. 2026-06-18 DAPO 5.9% Self-Conditioned Credit Assignment for RL with Verifiable Rewards