VeriEvol introduces a verifiable data-construction framework for visual mathematical reasoning, decoupling prompt difficulty and answer reliability. It evolves image-question prompts using type-aware operators and verifies answers via multi-source counter-evidence falsification. On five benchmarks, scaling from 10K to 250K samples improves mean accuracy from 35.42 to 54.73, with a cumulative +3.88 over baseline, driven by evolved prompts and HTV-Agent verification.
VeriEvol: Scaling Multimodal Mathematical Reasoning with Verifiable Evolution
from English