RegMix-D extends RegMix by leveraging full loss trajectories from proxy runs to dynamically select data mixtures. It outperforms RegMix and DoReMi across 13 downstream tasks, achieving superior results with just 128 proxy models—25% of RegMix's compute budget.
RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories
from English