A user shares optimized settings for running Qwen 3.6 27B with Q8_0 quantization on an RTX 4090 and RTX 3090 setup using llama.cpp. The configuration includes tensor split, 999 layers on GPU, 250k context, speculative decoding, and unified KV cache, achieving 75-100t/s throughput with vision and MTP support.
Best Settings for 48GB VRAM with Qwen 3.6 27B
from English