BIM-Edit: Benchmarking LLMs for IFC-Based BIM Editing

BIM-Edit introduces a benchmark to evaluate large language models on natural-language editing of Building Information Models in IFC format. It includes 324 editing tasks across 11 real and 36 synthetic building models, assessing geometric accuracy, semantic validity, and topological consistency. The best model achieves only 49.5% average score, with no model solving more than 3.4% of tasks, highlighting a significant gap in LLM capabilities for engineering design workflows.