A benchmark compares llama.cpp forks (ik_llama and spiritbuun) running Qwen3.6-35B-A3B APEX with I-Compact and I-Quality models. ik_llama with I-Compact achieves highest speed (~146 TPS), while spiritbuun with I-Quality and turbo8/turbo4 cache matches this speed and offers slightly better HellaSwag performance. turbo8/turbo4 KV caches outperform q8_0/q5_0, especially at longer contexts, with up to 15% speed gain and lower KLD, making them superior for quality and context length.
Qwen3.6-35B-A3B APEX on RTX 3090: Speed and Quality Benchmarks
from English