Researchers propose DiscoLoop, a looping transformer architecture that carries both discrete embedding channels and continuous hidden-state channels to improve two-hop reasoning within a single forward pass. The method addresses the representational bottleneck found in standard looped transformers by realigning hidden states with bridge token embeddings without additional training.
- Standard non-recurrent Transformers suffer from depth-local storage problems where facts learned in earlier layers are unavailable for second-hop retrieval.
- Previous looped transformers generalized imperfectly because hidden states remained poorly aligned with bridge token embeddings despite correct decodable entities.
- DiscoLoop utilizes a mixed-channel design that achieves near-perfect accuracy with substantially fewer training steps on symbolic and synthetic-language tasks.
- In real-world pretraining, the architecture attains lower training loss and stronger benchmark performance than looped-transformer baselines.
The authors consider this significant because the mixed-channel design transfers to practical language modeling, allowing models to internalize multi-step reasoning more effectively.