Importance-Weighted On-Policy Distillation Addresses Position Bias
On-Policy Distillation (OPD) suffers from position bias where later tokens provide poor supervision. We introduce Importance-Weighted On-Policy Distillation (IW-OPD), which assigns weights based on distribution discrepancy, prioritizing early tokens. IW-OPD converges faster and achieves up to 6.9 point performance gains on AIME-2025.