Benchmark · math

PutnamBench

2 results 2 models

RobustCoTAgent our framework

Timeline

2026-06-17 RobustCoTAgent 0.0% Automated Prompt Optimization for LLM Game Agents
2026-06-17 our framework 72.5% Automated Prompt Optimization for LLM Game Agents