The llama.cpp project has released version b9844, which introduces ggml-webgpu support for the NVFP4 quantization format. This update also provides pre-built binaries for macOS, iOS, Linux, Android, Windows, and openEuler across various hardware backends.

  • Added NVFP4 support to ggml-webgpu via pull request #25143.
  • Disabled KleidiAI builds for macOS Apple Silicon and openEuler in this release.
  • Provided binaries for Ubuntu (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Windows (CPU, CUDA 12/13, Vulkan, OpenVINO, SYCL, HIP), and Android arm64.
  • Released macOS Apple Silicon (arm64 and x64) binaries along with an iOS XCFramework.
  • Included UI binaries for general use.

This release enables developers to utilize NVFP4 quantization on WebGPU devices while offering updated pre-built executables for a wide range of operating systems and GPU architectures.