A user reports achieving only 60 tokens per second in short bursts and average 40-45 TPS when running Qwen 3.6 27B with Q8_0 quantization on two GeForce 3090 GPUs connected via NVLink. The setup includes Ubuntu 24.04, Ryzen 7950x3D, and 64GB DDR5, with display routed through an eGPU.
Maximizing Performance of 2x3090 with NVLink
from English