DiT-Reward: Generative Representations for Text-to-Image Reward Modeling
The article introduces DiT-Reward, a method that converts a pretrained text-to-image Diffusion Transformer into a reward model by processing near-clean image latents and aggregating text-conditioned representations across transformer layers. This approach leverages generative representations to evaluate the quality of generated images without requiring separate training objectives.