NuclearQAv2: A Structured Benchmark for Evaluating Domain-Science Competence in Large Language Models
Researchers introduce NuclearQAv2, a new benchmark designed to assess the reliability of large language models in nuclear engineering by testing factual knowledge, quantitative reasoning, and conceptual understanding.