ToolGraph enhances multi-turn tool-using agents by integrating schema topology, transition weights, and history-aware controls. Training with DPO on 161 divergence-point preference pairs improves performance: ToolGraph+DPO achieves a 16.8% relative reward gain over baseline, especially in airline and retail tasks, with reward positivity emerging as the key diagnostic signal.
arxiv
arXiv cs.CL
·
2d ago
·
src: 3d ago
·
research
Self-Evolution of Tool-Calling Agents via Divergence-Point Preference Learning
from English
Importance 2/3
New harness with differentiators
arXiv cs.CL
Allen AI
AI agents
Evaluation & benchmarks
Reasoning models