Benchmark · math

AIME 2024

5 results 3 models
0 5.5 11 16.5 22 2026-06-18 STARE · 4 · 2026-06-18 STARE · 4 · 2026-06-18 STARE · 4 · 2026-06-18 adaptive prompt selection mechanism · 19.6 · 2026-06-19 greedy router · 19.1 · 2026-06-19
STARE adaptive prompt selection mechanism greedy router
Timeline
  1. 2026-06-19 adaptive prompt selection mechanism 19.6% Adaptive LLM Tutoring Improves Engagement and Efficiency
  2. 2026-06-19 greedy router 19.1% Adaptive LLM Tutoring Improves Engagement and Efficiency
  3. 2026-06-18 STARE 4.0% STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability
  4. 2026-06-18 STARE 4.0% STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability
  5. 2026-06-18 STARE 4.0% STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability