Qwen3.6 27B on RTX 5090 achieves 140 tok/s mean with tuned llama.cpp settings

A user shares detailed performance metrics for running the Qwen3.6 27B model on an RTX 5090, AMD 9800X3D, and 64GB RAM system using llama.cpp.

Tuning involved q8 KV cache, 192k context, MTP draft=10, spec-draft-p-min=0.5, and batch/ubatch 512.
Analysis of 6,454 samples over a mixed agentic coding session showed a mean throughput of 140.7 tok/s and median of 134.9 tok/s.
Peak performance reached the 120-130 tok/s bucket with a long tail extending up to 233 tok/s.
The author notes that hybrid attention/SWA cache handling in llama.cpp is not yet perfect for this model, causing prompt reprocessing warnings.

The post highlights that average numbers can hide performance variations, providing a real distribution of speeds rather than just a headline figure.