A new method called probe-and-refine tuning uses synthetic bug-fix probes to iteratively improve repository guidance files with single-shot LLM calls, without agent loops or tool use. On SWE-bench Verified, it achieves a 33.0% mean resolve rate—14.5 percentage points higher than the initial static knowledge base—showing improved coverage rather than patch precision. The method enables agents to use larger step budgets effectively, and performance remains stable across models when diagnostic output is sufficient.
arxiv
arXiv cs.LG
·
6d ago
·
research
Probe-and-Refine Tuning Improves Coding Agent Performance
from English
Importance 3/3
New feature vs. leaders
New harness with differentiators
arXiv cs.LG
Alibaba (Qwen)
Microsoft Research
NVIDIA
AI agents
Code generation
Reasoning models
Benchmarks
| Benchmark | Model | Score |
|---|---|---|
| SWE-bench Verified | NVIDIA-Nemotron-3-Nano-30B-A3B | — |