Researchers introduce Evolution Fine-Tuning (EFT), a mid-training paradigm that teaches Large Language Models to evolve solutions across diverse tasks by converting evolutionary search trajectories into supervision. This approach addresses the limitation of prior methods that discard accumulated experience, enabling models to reuse discovery capabilities rather than solving new problems from scratch.
- The authors construct the Finch Collection, a dataset of 156K trajectories spanning 10 domains and 371 optimization tasks.
- Open-source LLMs ranging from 2B to 9B parameters were fine-tuned using this method.
- EFT confers cross-task generalization, with models surpassing their base counterparts by an average of 10.22% across 22 held-out tasks.
- When paired with test-time reinforcement learning, the model matches state-of-the-art performance on two circle-packing tasks and outperforms its base counterpart on the Erdős minimum-overlap problem.
EFT serves as a practice phase for general-purpose discovery agents, allowing them to iteratively evolve solutions and reuse learned strategies across different optimization challenges.