Spectral Labs has released a release candidate for a calibration-aware Q4_K_M quantization of the Qwen3.5 0.8B model, utilizing a new method called SpectralQuant. This approach aims to make standard Q4_K_M footprints behave more like larger quant formats while maintaining compatibility with llama.cpp.
- SpectralQuant identifies behaviorally sensitive directions using calibration signals and shapes error to protect critical weights rather than spreading it evenly.
- The model achieves a 96.5% recovery of the performance gap between standard Q4_K_M and BF16 on the heldout120 evaluation suite, dropping loss from 3.4135 to 2.9961.
- At 4.52 BPW (415.7 MiB), SpectralQuant outperforms Unsloth's Q4_K_S, Q4_K_M, IQ4_NL, and IQ4_XS quants on heldout120, despite those using more bytes.
- The output is a strict standard GGUF file runnable with llama-cli or llama-server, containing no mixed-precision sidecars or dynamic quant formats.
This method allows users to run highly compressed models with near-full precision performance without requiring specialized inference engines or larger memory footprints.