VeriEvol: Scaling Multimodal Mathematical Reasoning via Verifiable Evol-Instruct

The authors introduce VeriEvol, an iterative framework designed to scale multimodal mathematical reasoning by decoupling prompt difficulty from answer reliability. This approach addresses the challenge of maintaining reliable reward labels as data volume increases in reinforcement learning pipelines. The system utilizes a type-aware evolution module to rewrite low-difficulty seeds into harder, image-grounded prompts through route-specific operators. Answer verification is handled by HTV-Agent, which accepts responses only after multi-source counter-evidence fails to refute them. Scaling evolved supervised fine-tuning data from 10K to 250K samples increased mean accuracy on five benchmarks from 35.42 to 54.73. When integrated with a fixed GRPO recipe, VeriEvol provided a cumulative gain of +3.88 over an un-evolved baseline. This improvement is attributed to +1.82 from evolved prompts and +2.06 from the HTV-Agent verifier. The authors release all prompts, data, models, code, and full verifier traces to enable downstream auditing and scaling.