RefRad2D is a large-scale bilingual dataset of 1.2M CT and MR image-text pairs from clinical practice. Trained on this data, RadGrounder achieves competitive results in VQA and report generation while maintaining language quality through spatial grounding supervision without performance degradation.
RefRad2D Dataset Enables Scalable Spatial Grounding in Radiology
from English