Qwen3.6-27B runs at ~60 tokens/sec on 32GB VRAM with FP8 KV quantization. NVFP4 kv cache quantization on SM120 could significantly enhance performance on such systems, though current implementation is not yet available.