Logit-Contribution Scoring identifies non-literal retrieval heads in LLMs

Researchers introduce Logit-Contribution Scoring (LOCOS), a write-aware detector that identifies attention heads performing non-literal synthesis in long-context large language models. Unlike existing methods that rely on literal token copying, LOCOS scores heads by projecting their output-value circuit onto the answer-token unembedding direction.

Tested across Qwen3, Gemma-3, and OLMo-3.1 model families on the NoLiMa benchmark.
On Qwen3-8B, ablating the top 50 LOCOS heads reduced ROUGE-L from 0.401 to 0.000, whereas baselines retained 0.292.
Ablation also dropped MuSiQue scores from 0.55 to 0.08 and BABI-Long from 0.62 to 0.20.
Parametric recall and arithmetic reasoning remained at baseline levels under the same ablation, confirming retrieval specificity.

LOCOS provides a more precise mechanism for interpreting long-context model behavior by targeting the specific heads responsible for synthesizing answers rather than merely reading context.