NVIDIA Releases Nemotron-TwoTower-30B-A3B, a Diffusion-Based Language Model

NVIDIA has released the Nemotron-TwoTower-30B-A3B-Base-BF16 model, which is built upon the Nemotron 3 Nano 30B-A3B backbone. This architecture diverges from standard autoregressive models by utilizing a frozen context tower alongside a diffusion denoiser tower. The system iteratively fills blocks of tokens in parallel rather than generating them strictly one at a time. According to NVIDIA, this default mask-diffusion setup retains 98.7% of the aggregate benchmark quality found in the autoregressive baseline. Despite maintaining high quality, the model achieves 2.42 times its wall-clock generation throughput. The release highlights a novel approach to language modeling that combines diffusion techniques with large-scale language capabilities.