TrapQA testbed reveals language model hallucinations stem from inference misalignment

Researchers introduce TrapQA, a diagnostic testbed designed to investigate why large language models produce hallucinated answers that violate prompt constraints. The study frames this issue as "inference misalignment," where statistically salient latent associations override constraint-sensitive reasoning paths established during pretraining.

The framework utilizes a latent key-task model to demonstrate how pretraining-frequency imbalance can cause shortcut paths to dominate, inducing positive inference loss.
TrapQA consists of ScientistQA, which tests entity disambiguation among similar scientists using factual probes, and Real-Life Constrained QA, which evaluates everyday constraint following under salient shortcuts.
Results indicate that hallucinations often arise from biased latent inference rather than a simple lack of knowledge.

The findings suggest that addressing the mismatch between prompt-supported answers and favored latent associations is critical for reducing hallucination in language models.