PairCoder introduces a two-agent pair programming framework where a Driver writes code and a Navigator reviews it against verification evidence, switching roles when errors persist. This approach addresses the brittleness of single-pass inference by grounding review in the toolchain for generating structured artifacts like charts and CAD models.

  • Evaluated across 17 public benchmarks and seven models from three vendors.
  • Improved Blender scene executability from 0.20 to 0.78.
  • Increased TikZ compile rate by 10 to 30 points on every model.
  • Operates at 2.9 to 9.2 times the cost of single model inference, averaging about 7 times overall.

The method provides a reliable recipe for verified code-driven generation, particularly where the toolchain offers an informative oracle.