A user achieved 100 tokens per second on Qwen3.6-27B at Q8_0 using two GPUs (RTX 5090 and RTX 3090 Ti). Switching from layer split to tensor split mode increased throughput from 70 to 100 t/s, with a 70/30 tensor split favoring the 5090 to match compute power. Throughput varies by prompt, reaching up to 130 t/s in some cases.
100 t/s on Qwen3.6-27B Q8 across 5090 + 3090 Ti with tensor split-mode
from English