A mechanistic analysis shows retrieval heads are causally necessary for long-context recall. Higher RoPE frequencies do not reduce head counts, and zeroing low-frequency RoPE dimensions in retrieval heads degrades recall dose-dependently, with effects observed across five models and multiple architectures.
RoPE Does Not Prevent Retrieval Heads, Study Finds
from English