Clark Labs has released a compressed version of the Sana 1.6B text-to-image transformer, quantized to ternary weights at approximately 1.85 bits per weight. This compression results in a model that is 8.6 times smaller than the standard FP16 version while maintaining near-FP16 quality.
- The packed model size is 374 MB compared to the 3.21 GB reference FP16 transformer.
- Weights are quantized to ternary with group-wise scales, keeping a small high-precision tail of about 5% of parameters for conditioning and projection layers.
- An unpacked version is provided as dequantized bf16 weights for drop-in compatibility with the diffusers library.
This release allows users to deploy high-quality text-to-image generation with significantly reduced memory footprint, facilitating easier local inference.