Scientific Fine-Tuning Increases LLM Hallucinations

SciFactCheck evaluates 18 LLMs across five scientific domains, finding that scientifically fine-tuned models show degraded factual reliability and reduced internal confidence despite greater linguistic assertiveness. Human studies reveal limited agreement between fact-checking tools and expert judgments, highlighting challenges in defining valid scientific claims.