arxiv arXiv cs.CL · 8d ago · research

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

from English

As enterprise agent tool catalogs scale from 10 to 110 agents, routing accuracy drops 16--23 percentage points on under-specified requests. An oracle analysis identifies retrieval and confusion gaps, with embedding-based shortlisting recovering +10--11pp F1. A human-annotated study of 1,435 utterances confirms real-world recovery of +10--17pp despite lower absolute performance.

Importance 3/3 New harness with differentiators arXiv cs.CL OpenAI Anthropic Google DeepMind AI agents Evaluation & benchmarks Research paper

Benchmarks

Benchmark	Model	Score
SWE-bench Verified	three frontier models	—

Read original