The article introduces a novel method for automatic mapping between disease classification systems, such as ICD-9-CM and ICD-10-CM, that addresses the limitations of existing embedding-based approaches which often overlook complex one-to-many scenarios. By employing a blocking-and-matching pipeline inspired by entity resolution, the authors utilize large language models to identify valid mappings within candidate blocks.
- The method generates a block of candidate matches through blocking and uses an LLM for matching within each block.
- It balances the inherent trade-offs between precision, recall, and mapping coverage found in threshold-based and top-K methods.
- Empirical results show higher precision with comparable recall and broader coverage across ICD-9-CM↔ICD-10-CM and ICD-10-AM↔ICD-11 pairs.
This approach helps users integrate health data and conduct longitudinal analysis by providing more accurate and comprehensive mappings between different disease classification codes.