PseudoBench evaluates agentic auto-research systems' ability to detect pseudoscientific claims. Testing seven state-of-the-art agents, it finds near-zero refusal rates and only 27.4% resistance to pseudoscientific narratives, with stronger agents often using sophisticated scientific language to mask pseudoscience.
PseudoBench: Benchmarking Agentic Auto-Research Resistance to Pseudoscience
from English