Importance-Weighted On-Policy Distillation Addresses Position Bias
On-Policy Distillation (OPD) suffers from position bias where later tokens provide poor supervision. Importance-Weighted OPD (IW-OPD) assigns dynamic weights based on distribution discrepancy, prioritizing early tokens and suppressing late ones. IW-OPD converges faster and achieves up to 6.9 point performance gains on AIME-2025 compared to standard OPD.