Researchers introduce MolSafeEval, a benchmark designed to evaluate the safety risks of AI-generated molecules, addressing the gap where current benchmarks overlook potential hazards like toxicity and reactivity. The system integrates heterogeneous safety knowledge from toxicological databases and hazard rules into a structured molecular safety knowledge graph to enable systematic detection and explanation of unsafe features via large language model reasoning.

  • MolSafeEval categorizes molecular generative models into four task types: unconditional generation, property optimization, target protein-based design, and text-based generation.
  • The benchmark provides standardized datasets and safety evaluation protocols for each of these representative task categories.
  • It utilizes a structured knowledge graph to uncover safety vulnerabilities that narrow toxicity predictors often miss.

MolSafeEval offers a new lens for benchmarking molecular models and provides essential guidance toward safer, more trustworthy molecular design by systematically revealing the safety vulnerabilities of current generative approaches.