Multi-agent LLM pipeline expands reaction classification taxonomy from 68 to 14,073 classes

Researchers present a fully automated pipeline using a multi-agent framework of large language models to classify reactions and generate rules across 665,901 US patent reactions. This approach expands a standard taxonomy from 68 to 14,073 classes without human curation by testing each rule against the corpus in a verification loop.

The system generates reaction rules under a verification loop that tests them against the corpus of 665,901 US patent reactions.
It expands the standard taxonomy from 68 to 14,073 classes without human curation.
A lightweight fingerprint classifier achieves 97.7% accuracy on unseen reactions, matching a leading proprietary classifier while resolving chemistry more finely.

The result is a living reactivity database and a general route to turning generative models into reliable, self-expanding symbolic systems.