The llama.cpp project has released version b9844, which introduces ggml-webgpu support for the NVFP4 quantization format. This update also provides pre-built binaries for macOS, iOS, Linux, Android, Windows, and openEuler across various hardware backends.
- Added NVFP4 support to ggml-webgpu via pull request #25143.
- Disabled KleidiAI builds for macOS Apple Silicon and openEuler in this release.
- Provided binaries for Ubuntu (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Windows (CPU, CUDA 12/13, Vulkan, OpenVINO, SYCL, HIP), and Android arm64.
- Released macOS Apple Silicon (arm64 and x64) binaries along with an iOS XCFramework.
- Included UI binaries for general use.
This release enables developers to utilize NVFP4 quantization on WebGPU devices while offering updated pre-built executables for a wide range of operating systems and GPU architectures.