Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models

This study investigates how vision-language models resolve conflicts between visual evidence and memorized world knowledge by combining activation patching with mechanistic analysis across three model families. The research identifies a sparse causal circuit where visual grounding is the default, while overriding it with prior knowledge requires specific attention heads.

Visual grounding emerges by default, whereas prior grounding depends on a small set of causally necessary attention heads (2.5-4.8%) concentrated in the second half of the network.
These heads enable answers from stored world knowledge despite conflicting visual input, establishing an asymmetric causal structure.
Ablating these heads flips predictions from knowledge-grounded to visually grounded answers in 68-96% of cases under prior-knowledge prompts.
The identified heads decompose into routing heads that modulate information flow and writing heads that directly project answer tokens into the residual stream.

This structure is consistent across model families and scales, revealing a sparse causal circuit underlying perception-knowledge conflict in VLMs.