A new framework automates prompt refinement for LLM agents by splitting the observation-to-action pipeline into goal-conditioned and action selection modules. It uses an LLM-driven evolutionary loop to iteratively improve prompts based on environment feedback, achieving up to 72.5% success on PutNext where prior agents failed, without model fine-tuning.
arxiv
arXiv cs.CL
·
8d ago
·
research
Automated Prompt Optimization for LLM Game Agents
from English
Importance 3/3
Beats a top-lab benchmark
New feature vs. leaders
New harness with differentiators
arXiv cs.CL
OpenAI
Google DeepMind
Mistral AI
AI agents
Evaluation & benchmarks
Reasoning models
Benchmarks
| Benchmark | Model | Score |
|---|---|---|
| PutnamBench | our framework | 72.5% |
| PutnamBench | RobustCoTAgent | 0% |