SpectralQuant Qwen3.5 0.8B Q4_K_M recovers 96.5% of BF16 gap

Spectral Labs has released a release candidate for a calibration-aware Q4_K_M quantization of the Qwen3.5 0.8B model, utilizing a new method called SpectralQuant. This approach aims to make standard Q4_K_M footprints behave more like larger quant formats while maintaining compatibility with llama.cpp.

SpectralQuant identifies behaviorally sensitive directions using calibration signals and shapes error to protect critical weights rather than spreading it evenly.
The model achieves a 96.5% recovery of the performance gap between standard Q4_K_M and BF16 on the heldout120 evaluation suite, dropping loss from 3.4135 to 2.9961.
At 4.52 BPW (415.7 MiB), SpectralQuant outperforms Unsloth's Q4_K_S, Q4_K_M, IQ4_NL, and IQ4_XS quants on heldout120, despite those using more bytes.
The output is a strict standard GGUF file runnable with llama-cli or llama-server, containing no mixed-precision sidecars or dynamic quant formats.

This method allows users to run highly compressed models with near-full precision performance without requiring specialized inference engines or larger memory footprints.