LLM-Assisted Label Cleaning in Chest CT Dataset
A large language model (LLM) assisted in identifying label-report discordance in the CT-RATE chest CT dataset. GPT-5.4 achieved 96.4% agreement with existing labels, with radiologist adjudication supporting LLM-derived labels in 74.2% of general and 91.9% of lymphadenopathy discordances. Multi-LLM majority-vote labels outperformed others in F1 score and kappa, and the cleaned dataset will be publicly released.