arxiv arXiv cs.CL · 8d ago · research

Automated Prompt Optimization for LLM Game Agents

from English

A new framework automates prompt refinement for LLM agents by splitting the observation-to-action pipeline into goal-conditioned and action selection modules. It uses an LLM-driven evolutionary loop to iteratively improve prompts based on environment feedback, achieving up to 72.5% success on PutNext where prior agents failed, without model fine-tuning.

Importance 3/3 Beats a top-lab benchmark New feature vs. leaders New harness with differentiators arXiv cs.CL OpenAI Google DeepMind Mistral AI AI agents Evaluation & benchmarks Reasoning models

Benchmarks

Benchmark	Model	Score
PutnamBench	our framework	72.5%
PutnamBench	RobustCoTAgent	0%

Read original