The llama.cpp project has released version b9855, which introduces an AVX2 optimization for the nvfp4 dot product using a UE4M3 Look-Up Table (LUT) within the ggml-cpu backend.

  • macOS Apple Silicon and Intel builds are available alongside an iOS XCFramework.
  • Linux binaries support Ubuntu x64, arm64, and s390x architectures via CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL (FP32/FP16).
  • Windows releases include CPU, OpenCL Adreno, CUDA 12.4 and 13.3, Vulkan, OpenVINO, SYCL, and HIP backends.
  • Android arm64 CPU binaries are provided for mobile deployment.
  • KleidiAI support on macOS Apple Silicon is disabled in this release.

This update provides users with optimized inference capabilities for specific hardware configurations and expands the range of supported accelerators across major operating systems.