Clinical Reasoning Graphs: Structured Evaluation of LLM Diagnostic Reasoning Reveals Competence Without Consistency
This study introduces clinical reasoning graphs to evaluate the diagnostic reasoning patterns of large language models, revealing that while they achieve competence, they lack consistent reasoning schemas. The authors extracted structured graph representations from 750 traces across five LLMs and tested for stable reasoning patterns in clinically similar cases.