MECoBench introduces benchmark for multimodal embodied agent collaboration

Researchers introduce MECoBench, a multimodal embodied cooperation benchmark designed to evaluate the collaborative capabilities of multimodal large language models (MLLMs) in visually grounded environments. The platform spans diverse real-world tasks and includes two cooperation structures alongside three distinct collaboration modes.

Extensive experiments reveal that while collaboration generally improves task completion, benefits depend on balancing gains against coordination complexity.
Communication is identified as essential for collaboration success, with optimal modes varying based on team size and model capability.
The benchmark demonstrates that collaboration enhances robustness under noisy priors and exploration conditions.

MECoBench provides a systematic testbed for understanding the mechanisms and limits of multimodal embodied collaboration.