The authors introduce DanceOPD, an on-policy generative field distillation framework designed to unify text-to-image generation with local and global editing capabilities in flow-matching models. This approach routes samples to specific capability fields and trains using a velocity MSE objective to compose expert skills without mutual interference.
- Routes each sample to one capability field and queries one low-noise student-induced state.
- Trains with a simple velocity MSE objective to compose expert capabilities from fields queried on rollout states.
- Absorbs operator-defined fields such as classifier-free guidance (CFG).
- Improves multi-capability composition while strengthening target capabilities and preserving anchor generation quality.
This work establishes a practical route for generative field distillation in flow-matching models, addressing the central challenge of effectively composing diverse image generation capabilities.