Slow performance Unsloth Gemma 12B Q8
A user reports a significant drop in inference speed when switching from GPT-OSS 20B Q4 to Gemma 4 12B Q8 using llama.cpp, with throughput falling from approximately 70 tokens per second to 10 tokens per second. The issue persists even when testing a Q5 model variant and disabling the thinking feature, which only yielded a marginal gain of two additional tokens per second.