media r/LocalLLaMA · 5d ago · open_models

$1800 GPU cost runs Qwen3.6-27B with 262K context and 55 tok/s

from English

A setup using four 5060 Ti GPUs (totaling $1800) achieves 55 tokens per second with Qwen3.6-27B-FP8, supporting 262K context length and bfloat16 KV cache. The configuration uses P2P and FlashInfer, with benchmark results showing 55.67 output token throughput and 65.25% speculative decoding acceptance rate.

Importance 2/3 r/LocalLLaMA Alibaba (Qwen) Code generation Inference efficiency Reasoning models

Benchmarks

Benchmark	Model	Score
Terminal-Bench	Qwen3.6-27b-FP8	55.67tok/s

Read original