Sevra, a serving-layer controller, selectively verifies answers to improve accuracy and reduce token usage. On \mathfive, it achieves 76.3% accuracy with 26.8% fewer post-generation tokens and halved harmful flips, while on \gsm it verifies only 3.0% of examples, boosting accuracy to 94.5% and cutting verification tokens by 91.2%. The study shows that initial solve length and explicit control needs determine optimal verification strategy.
Selective Verification for Budget-Aware Reasoning
from English