QAT KV Cache Quantization for Gemma 4 31B Shows Massive Improvement

QAT KV cache quantization for Gemma 4 31B significantly reduces KL divergence compared to standard quants. QAT q8_0 achieves a worst-case divergence of 1.5, outperforming standard q4_0 by a factor of about 38, and QAT q4_0 surpasses standard q8_0 in performance, with much lower output drift and no catastrophic outliers.