Benchmark · agentic

τ²-bench

2 results 2 models
0 0.2 0.5 0.8 1 2026-06-23 ToolGraph · 0.3 · 2026-06-23 ToolGraph+DPO · 0.4 · 2026-06-23
ToolGraph ToolGraph+DPO
Timeline
  1. 2026-06-23 ToolGraph 0.338% Self-Evolution of Tool-Calling Agents via Divergence-Point Preference Learning
  2. 2026-06-23 ToolGraph+DPO 0.355% Self-Evolution of Tool-Calling Agents via Divergence-Point Preference Learning