arxiv arXiv cs.CL · 23h ago · src: 2d ago · research

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

from English

HDS introduces a multi-objective reinforcement learning framework for online data mixing in LLM pre-training. It achieves 44% fewer training iterations on The Pile benchmark and improves MMLU 0-shot performance by 7.2%, with consistent gains across other benchmarks.

Importance 2/3 arXiv cs.CL Evaluation & benchmarks Research paper

Read original