Benchmarks show llama.cpp B70 with SYCL backend performs well on models like gemma4 12B and 26B, achieving throughput of up to 5662.45 t/s for the E2B model. Performance drops significantly in tg128 mode, with qwen35 27B reaching only 15.42 t/s, indicating room for optimization.