MAS-PromptBench: When Does Prompt Optimization Improve Multi-Agent LLM Systems?

This study systematically investigates the impact of system-prompt optimization on multi-agent systems (MAS) by benchmarking two optimizers across diverse configurations of tasks, workflows, and team sizes.

The research addresses the challenge of an exponentially growing search space when extending prompt optimization from single LLMs to MAS.
It evaluates performance gains across varying communication protocols and inter-agent coordination structures without model fine-tuning.
The work characterizes the sensitivity of these gains to specific system configurations and identifies open challenges in the field.

The findings reveal the potential for significant performance improvements while clarifying exactly when and how much prompt optimization benefits multi-agent setups.