A user reports successfully running a Qwen 3.6 27B model with Q6K+MTP quantization and 131k context length on a 7900XTX with 24GB VRAM. This is achieved using kvcache quantization (Q5_0/Q4_0), which reduces VRAM usage by 12% compared to Q8, enabling the model to run at 55-60 tokens per second with specific compile flags and llama.cpp arguments.
7900XTX 24GB VRAM Runs Qwen 3.6 27B with 131k Context
from English