SPOT-E introduces a test-time method that uses visual spotlights to enhance evidence grounding in frozen vision-language models. It employs low-entropy anchors and an entropy-shaping objective to reduce answer uncertainty while preserving high-confidence tokens, improving robustness under visual corruptions across benchmarks and VLM families.
SPOT-E: Test-Time Entropy Shaping with Visual Spotlights for Frozen VLMs
from English