Low-cost concept-based localized explanations: How far can we get with training-free approaches?

This study evaluates whether mid-scale Multimodal Large Language Models (MLLMs) can perform localized concept naming under strict zero-shot conditions by assigning labels to bounding-box regions. The authors propose a reproducible evaluation protocol for Concept Naming that includes closed-set prompting and an embedding-similarity-based strategy for large label spaces.

Experiments with four MLLMs ranging from 7B to 32B parameters demonstrate consistent performance trends across datasets.
The models achieve object-level exact-match accuracy between 62% and 88%.
The research highlights the potential of training-free concept annotation from localized regions for Concept-based Explainable AI (C-XAI).

The authors release a reproducible framework to support future low-cost C-XAI research, discussing limitations and failure modes identified during the evaluation.