ARKD: Adaptive Reinforcement Learning-Guided Bidirectional KL Divergence Distillation for Text Generation

The authors propose ARKD, a reinforcement-learning-based adaptive KL-weighted distillation framework that addresses the limitations of single KL objective methods in compressing Large Language Models. By using a policy network to dynamically assign weights to forward and reverse KL divergence based on teacher-student distributional characteristics, the method achieves dual alignment on principal and long-tail modes.

Utilizes a policy network guided by immediate reward signals to adaptively weight forward and reverse KL divergence.
Balances primary distribution fitting with long-tail probability modeling for improved generation quality.
Surpasses greedy heuristics by 0.4-0.6 points on Rouge-L and BertScore metrics.
Demonstrates consistent improvements over other baseline methods across diverse benchmarks.

This approach enhances both the generation quality and generalization of compressed models by effectively addressing the trade-offs inherent in traditional knowledge distillation techniques.