A study introduces JS divergence in GRPO-style autoregressive text-to-image post-training, showing it balances policy optimization and generation diversity. Experiments on LlamaGen and Janus-7B demonstrate JS divergence achieves top or strong performance on evaluation metrics while preserving diverse outputs.