Adaptive Data Scheduling Improves LLM Reinforcement Learning
Adaptive Data Scheduling (ADS) introduces a dual-level data scheduling framework that replaces uniform sampling with adaptive distribution over semantic clusters and policy-boundary sample selection. Experimental results show ADS improves average accuracy by 5.2% over GRPO across three LLMs and seven reasoning benchmarks, demonstrating its effectiveness as a general strategy for LLM RL post-training.