A gemma-4-26B-A4B model running on CPU-only with two Xeon 6248R processors achieves 64 tokens per second generation and 285 parallel processing, demonstrating viable performance on 6-year-old hardware. The user highlights the potential for CPU-optimized local LLMs to rival GPU-based systems, emphasizing cost efficiency and accessibility.
Who needs GPUs? 64 t/s gen, 285 PP on 6-year-old CPUs
from English