STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-Training
STAR introduces a spatio-temporal reward allocation method for text-to-image generation, using attention maps to dynamically assign advantages across denoising steps. It improves semantic alignment, text rendering, and preference optimization in Stable Diffusion 3.5 Medium, achieving 0.9759, 0.9757, and 23.60 on GenEval, OCR, and PickScore respectively.