LIHA identifies a small set of first-token broadcaster heads in GPT-2 that persistently attend to the initial prompt token, causing language switches. Instruction tuning reorganizes these circuits, concentrating language identity at early layers, as shown in a controlled comparison between Qwen2.5-1.5B-Base and Qwen2-1.5B-Instruct models. First-token broadcasting is script-specific, with non-Latin languages processed at layer 0, matching the instruct-tuned model's pattern.
First-Token Broadcasters in Transformers: Mechanistic Origins of Language Identity
from English