PairCoder introduces a two-agent pair programming framework where a Driver writes code and a Navigator reviews it against verification evidence, switching roles when errors persist. This approach addresses the brittleness of single-pass inference by grounding review in the toolchain for generating structured artifacts like charts and CAD models.
- Evaluated across 17 public benchmarks and seven models from three vendors.
- Improved Blender scene executability from 0.20 to 0.78.
- Increased TikZ compile rate by 10 to 30 points on every model.
- Operates at 2.9 to 9.2 times the cost of single model inference, averaging about 7 times overall.
The method provides a reliable recipe for verified code-driven generation, particularly where the toolchain offers an informative oracle.