Low-cost concept-based localized explanations: How far can we get with training-free approaches?
This study evaluates whether mid-scale Multimodal Large Language Models (MLLMs) can perform localized concept naming under strict zero-shot conditions by assigning labels to bounding-box regions. The authors propose a reproducible evaluation protocol for Concept Naming that includes closed-set prompting and an embedding-similarity-based strategy for large label spaces.