Xenova has released WebGPU kernels for Gemma 4, achieving a performance of 255 tokens per second. This optimization enables dense models to run at speeds exceeding 100 T/s in web browsers.

  • The implementation utilizes WebGPU technology to accelerate inference.
  • Performance reaches 255 tok/s on the Gemma 4 model.
  • A demo is available via the webml-community Hugging Face space.

This speed allows local private models to handle most tasks, reducing reliance on frontier APIs like Claude or Codex for everyday work.