A Reddit post reports that Gemma 4 QAT shows significant improvement in performance when using KV cache quantization, as measured on the wikitext dataset with 16k context. The user notes their hardware limits testing 31B models and invites others to explore the results.
Gemma 4 QAT responds better to KV cache quantization
from English