Simon Willison utilized Claude Code with the Fable 5 model to automate the evaluation and optimization of system prompts for Datasette Agent, specifically targeting its read-only SQL query execution feature. The process involved installing the latest Datasette alpha and DSPy to identify weaknesses in how the agent handles schema information.

  • The automated research task employed GPT 4.1 mini and nano models to test prompt variations.
  • Analysis revealed that excluding column names from schema listings caused column-name guessing and error-retry loops.
  • A key finding was that advice against calling describe_table when information is available led to incorrect guesses like page_count or o.order_id.
  • The proposed solution involves including column names directly in the prompt's schema listing or softening the restriction on table description calls.

This approach demonstrates how automated agents can systematically identify and resolve specific failure modes in LLM system prompts, improving reliability for data querying tasks.