Improving Verbalized Uncertainty Calibration in Medical VQA
This work addresses the tendency of multimodal large language models to produce overconfident outputs in Medical Visual Question Answering by proposing a training-based framework that finetunes these models for better calibration. The method employs a composite loss function combining Brier-style calibration, anchor regularization, contrastive image-text alignment, and KL divergence terms to align model confidence with actual correctness.