Multi-Block Diffusion Language Models

Researchers propose Multi-Block Diffusion Language Models (MBD-LMs) to extend Single-Block diffusion text generation by decoding a running-set of consecutive blocks concurrently for inter-block parallelism. The approach bridges the gap between training and inference states through a post-training method called Multi-block Teacher Forcing (MultiTF).

The method utilizes MultiTF, which integrates teacher forcing and diffusion forcing on bounded noise-groups with randomized noise-schedulers to match MultiBD inference. An optimized decoding algorithm based on the Block Buffer mechanism is introduced to preserve prefix-cache reuse and keep input shapes static. Empirical results show MBD-LLaDA2-Mini increases average Tokens Per Forward pass from 3.47 to 6.19 while improving accuracy from 79.95% to 81.03%. When combined with DMax, the model reaches an average TPF of 9.34 with only a 1.02% accuracy drop on math and code benchmarks.

This work translates increased decoding parallelism into wall-clock acceleration while maintaining or improving generation accuracy.