Researchers propose DiscoLoop, a looping transformer architecture that carries both discrete embedding channels and continuous hidden-state channels to improve two-hop reasoning within a single forward pass. The method addresses the representational bottleneck found in standard looped transformers by realigning hidden states with bridge token embeddings without additional training.

  • Standard non-recurrent Transformers suffer from depth-local storage problems where facts learned in earlier layers are unavailable for second-hop retrieval.
  • Previous looped transformers generalized imperfectly because hidden states remained poorly aligned with bridge token embeddings despite correct decodable entities.
  • DiscoLoop utilizes a mixed-channel design that achieves near-perfect accuracy with substantially fewer training steps on symbolic and synthetic-language tasks.
  • In real-world pretraining, the architecture attains lower training loss and stronger benchmark performance than looped-transformer baselines.

The authors consider this significant because the mixed-channel design transfers to practical language modeling, allowing models to internalize multi-step reasoning more effectively.