LLM-based Metrics Improve Clinical Significance Evaluation in Radiology

A study introduces lightweight, interpretable metrics that sharpen the boundary between clinically significant errors and harmless variations in radiology reports. These metrics outperform large medical LLMs and rival proprietary models, with one-pass training proven effective for cost-sensitive deployment. The two-pass setting fails to consistently improve performance and shifts focus from error detection to robustness.