The llama.cpp b9831 release introduces DFlash v2 support, including sliding window attention per layer types, alongside a comprehensive set of pre-built binaries for multiple platforms.

  • macOS Apple Silicon (arm64) and Intel (x64) builds are available, along with an iOS XCFramework.
  • Linux binaries cover Ubuntu x64 and arm64 CPU, s390x CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL FP32/FP16.
  • Android arm64 CPU builds are provided for mobile devices.
  • Windows releases include CPU, OpenCL Adreno, CUDA 12.4 and 13.3, Vulkan, OpenVINO, SYCL, and HIP variants.
  • openEuler x86 and aarch64 builds with ACL Graph support are included, while macOS KleidiAI and openEuler generic builds are disabled.

This release enables users to run llama.cpp on a wider range of hardware accelerators and operating systems with the new DFlash optimization features.