Researchers introduce LOTUS, a method using looped padded Transformers to perform multi-step reasoning in hidden states, effectively bridging the performance gap between latent and explicit chain-of-thought at the 3B parameter scale. The model processes K latent blocks in parallel for R iterations with cross-entropy loss on gold CoT-step tokens.

  • LOTUS is the first latent-CoT method to match explicit CoT performance at the 3B scale.
  • It reduces thought-phase latency by 2.5x to 6.9x compared to token-by-token generation.
  • Projecting post-loop latents through the base LM head recovers gold reasoning steps and surfaces alternative valid intermediate steps.
  • Ablations confirm that both the looped backbone and parallel supervision on gold CoT tokens are essential for this performance.

The approach demonstrates that latent spaces can be interpretable and CoT-aligned, offering a more efficient alternative to explicit token generation for complex reasoning tasks.