A study introduces JS divergence in GRPO-style autoregressive text-to-image post-training, showing it balances policy optimization and generation diversity. Experiments on LlamaGen and Janus-7B demonstrate JS divergence achieves top or strong performance on evaluation metrics while preserving diverse outputs.
JS Divergence Improves GRPO Autoregressive Text-to-Image Alignment
from English