arxiv arXiv cs.CL · 8d ago · research

OPD-Evolver: On-Policy Distillation for Holistic Agent Evolving

from English

OPD-Evolver introduces a slow-fast co-evolution framework that enables agents to select, act on, and reuse experience through on-policy self-distillation. It outperforms existing memory and training-based methods by up to 11.5% and 5.8% respectively, and demonstrates capability to challenge large-scale models like Qwen3.5-397B-A17B and Step-3.5-Flash.

Importance 3/3 Beats a top-lab benchmark New feature vs. leaders New harness with differentiators arXiv cs.CL Mistral AI Alibaba (Qwen) DeepSeek AI agents Evaluation & benchmarks Reasoning models

Read original