Gemma 4 WebGPU Kernels Achieve 255 tok/s

Xenova has released WebGPU kernels for Gemma 4, achieving a performance of 255 tokens per second. This optimization enables dense models to run at speeds exceeding 100 T/s in web browsers.

The implementation utilizes WebGPU technology to accelerate inference.
Performance reaches 255 tok/s on the Gemma 4 model.
A demo is available via the webml-community Hugging Face space.

This speed allows local private models to handle most tasks, reducing reliance on frontier APIs like Claude or Codex for everyday work.