Qwen3.6-27B runs at ~60 tokens/sec on 32GB VRAM with FP8 KV quantization. NVFP4 kv cache quantization on SM120 could significantly enhance performance on such systems, though current implementation is not yet available.
NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable
from English