This independent research project characterizes the internal dynamics of seven small and medium-sized language models by analyzing how hidden representations evolve during inference rather than relying on standard output benchmarks. The study investigates dynamic behavior, functional organization, and representation geometry to identify reproducible patterns across different architectures.

  • The analysis covers GPT-2, DistilGPT2, OPT-125M, Qwen2.5-0.5B-Instruct, TinyLlama-1.1B-Chat, Phi-1.5, and Llama-3.2-1B.
  • Models consistently separate into two clusters: GPT-2 and DistilGPT2 form one group, while the remaining five models form another despite architectural differences.
  • Functional information is linearly decodable from hidden representations, with varying functional capacity across layers that does not align at identical absolute depths.
  • Orthogonal rotations preserve decodability almost entirely, suggesting functional signals depend on representation space geometry rather than specific embedding dimensions.

The research aims to move from observation to causal testing to determine if perturbing specific functional regions alters downstream behavior and how these organizational principles scale with model size.