Gazer introduces a training-free framework that uses multimodal large language model feedback to correct semantic errors in real time during autoregressive visual model generation. By integrating reflective diagnosis and semantic correction stages, Gazer improves compositional accuracy and semantic alignment across multiple models without additional training.
Gazer: Training-Free Semantic Correction for Autoregressive Visual Models
from English