MedHal-Loc Benchmark Tests Localization Faithfulness in Medical Hallucination Detectors

MedHal-Loc introduces a benchmark to evaluate whether medical hallucination detectors accurately localize errors. It finds that while some architectures localize well above chance, a knowledge-graph pipeline performs no better than random due to poor entity extraction, despite strong detection performance. The results show that detection capability does not guarantee faithful localization, challenging assumptions about architectural explainability.