Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning
Researchers introduce the Holistic Data Scheduler (HDS), a novel online data mixing framework that addresses the limitations of existing methods by considering dynamic data composition from multiple dimensions. HDS formulates data scheduling as a reinforcement learning problem using the Soft Actor-Critic algorithm and a multi-objective reward function.