A detailed analysis maps the KLD (Kullback-Leibler divergence) of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B models. Results show q8/q8 quantization is nearly lossless on both models, while q4/q4 performs well on Qwen but causes severe degradation on Gemma. Turbo quantization variants show mixed performance, with turbo3 and turbo2 enabling extreme cache compression at significant accuracy cost.
KLD Analysis of KV Cache Quantization for Qwen3.6-35B-A3B and Gemma4-E2B QAT
from English