Xenova has released WebGPU kernels for Gemma 4, achieving a performance of 255 tokens per second. This optimization enables dense models to run at speeds exceeding 100 T/s in web browsers.
- The implementation utilizes WebGPU technology to accelerate inference.
- Performance reaches 255 tok/s on the Gemma 4 model.
- A demo is available via the webml-community Hugging Face space.
This speed allows local private models to handle most tasks, reducing reliance on frontier APIs like Claude or Codex for everyday work.