Benchmark · math

PutnamBench

2 results 2 models
0 19.5 39 58.5 78 2026-06-17 RobustCoTAgent · 0 · 2026-06-17 our framework · 72.5 · 2026-06-17
RobustCoTAgent our framework
Timeline
  1. 2026-06-17 RobustCoTAgent 0.0% Automated Prompt Optimization for LLM Game Agents
  2. 2026-06-17 our framework 72.5% Automated Prompt Optimization for LLM Game Agents