ARKD: Adaptive Reinforcement Learning-Guided Bidirectional KL Divergence Distillation for Text Generation
The authors propose ARKD, a reinforcement-learning-based adaptive KL-weighted distillation framework that addresses the limitations of single KL objective methods in compressing Large Language Models. By using a policy network to dynamically assign weights to forward and reverse KL divergence based on teacher-student distributional characteristics, the method achieves dual alignment on principal and long-tail modes.