Sumi: Open Uniform Diffusion Language Model from Scratch

Sumi is a 7B-parameter uniform diffusion language model pretrained from scratch on 1.5T tokens. It competes with autoregressive models on knowledge, reasoning, and coding tasks but underperforms on commonsense benchmarks, likely due to its education-heavy data mixture. The model weights, checkpoints, and full training recipe are publicly released.