DART enables hybrid reasoning models to route queries between direct answering and extended thinking without training data. It uses two no-think drafts to decide response mode and estimates thinking budget from draft disagreement. DART improves accuracy by up to 9.0 points in math and 22.-5 points in code reasoning while reducing thinking tokens by 15-69% and 51-63% respectively.
DART: Training-Free Routing for Adaptive Thinking Budgets
from English