Two-Stage Alignment Improves Math Tutoring Pedagogy

A two-stage alignment pipeline enhances large language models' pedagogical performance in math mistake remediation. The approach combines supervised fine-tuning with Direct Preference Optimization using synthetic data on scaffolding and factuality, outperforming base and existing tutoring models in both accuracy and teaching quality. Human evaluations show the model competes with a proprietary baseline, offering greater openness and reproducibility.