LOTUS bridges latent and explicit reasoning with looped Transformers

Researchers introduce LOTUS, a method using looped padded Transformers to perform multi-step reasoning in hidden states, effectively bridging the performance gap between latent and explicit chain-of-thought at the 3B parameter scale. The model processes K latent blocks in parallel for R iterations with cross-entropy loss on gold CoT-step tokens.

LOTUS is the first latent-CoT method to match explicit CoT performance at the 3B scale.
It reduces thought-phase latency by 2.5x to 6.9x compared to token-by-token generation.
Projecting post-loop latents through the base LM head recovers gold reasoning steps and surfaces alternative valid intermediate steps.
Ablations confirm that both the looped backbone and parallel supervision on gold CoT tokens are essential for this performance.

The approach demonstrates that latent spaces can be interpretable and CoT-aligned, offering a more efficient alternative to explicit token generation for complex reasoning tasks.