Researchers introduce Agents-A1, a 35B Mixture-of-Experts model that achieves performance comparable to trillion-parameter models by scaling the agent horizon rather than parameter count. The approach focuses on extending long-horizon trajectories and unifying heterogeneous agent abilities through a specialized training infrastructure.
- Agents-A1 utilizes a long-horizon knowledge-action infrastructure producing agentic trajectories with an average length of 45K tokens.
- Training follows a three-stage recipe: full-domain supervised fine-tuning, domain-level teacher model training, and multi-teacher domain-routed on-policy distillation.
- The model unifies six heterogeneous domains into a single deployable student model using salient vocabulary alignment for efficient knowledge transfer.
- Agents-A1 outperforms trillion-parameter models like Kimi-K2.6 and DeepSeek-V4-pro on benchmarks including SEAL-0 (56.4), IFBench (80.6), HiPhO (46.4), FrontierScience-Olympiad (79.0), and MolBench-Bind (56.8).
This work offers a practical path for scaling agent horizons, demonstrating that a 35B model can match the performance of significantly larger models on long-horizon tasks.