A systematic study challenges the assumption that all layers contribute equally to reinforcement learning (RL) post-training in large language models. The authors find that training a single transformer layer can recover most of the gains achieved by full-parameter RL, and sometimes surpass it.

  • The researchers introduce "layer contribution" to measure the fraction of full RL improvement recovered by training a layer in isolation.
  • Across seven models from Qwen3 and Qwen2.5 families, using GRPO, GiGPO, and Dr. GRPO algorithms, gains were highly concentrated in a small subset or single layers.
  • High-contribution layers consistently concentrate in the middle of the transformer stack, while input and output layers contribute substantially less.
  • Layer rankings remained strongly correlated across datasets, tasks, model families, and RL algorithms.

This finding suggests that RL adaptation is not uniformly distributed but is instead highly localized within specific structural regions of the model.