arxiv arXiv cs.CL · 2d ago · src: 3d ago · research

Self-Evolution of Tool-Calling Agents via Divergence-Point Preference Learning

from English

ToolGraph enhances multi-turn tool-using agents by integrating schema topology, transition weights, and history-aware controls. Training with DPO on 161 divergence-point preference pairs improves performance: ToolGraph+DPO achieves a 16.8% relative reward gain over baseline, especially in airline and retail tasks, with reward positivity emerging as the key diagnostic signal.

Importance 2/3 New harness with differentiators arXiv cs.CL Allen AI AI agents Evaluation & benchmarks Reasoning models

Benchmarks

Benchmark	Model	Score
τ²-bench	ToolGraph+DPO	0.35%
τ²-bench	ToolGraph	0.34%

Read original