A study of sequential Direct Preference Optimization finds that later training does not uniformly degrade earlier learned preferences. The effect varies by objective relationship, signal strength, and training order, ranging from partial degradation to positive transfer. Pair-level analysis reveals heterogeneous changes, with high-confidence preference pairs sometimes improving despite aggregate metric stability.
Sequential DPO Shows Variable Preference Impact Across Settings
from English