The llama.cpp project has released version b9856, introducing consistent use of the `restrict` keyword and PDL for Flash Attention in CUDA. This update is accompanied by pre-built binaries for macOS, Linux, Android, Windows, and openEuler across various hardware backends.
- macOS Apple Silicon (arm64) builds are available, while KleidiAI support remains disabled.
- Linux binaries cover CPU (x64, arm64, s390x), Vulkan, ROCm 7.2, OpenVINO, and SYCL FP32/FP16.
- Windows releases include CPU, OpenCL Adreno, CUDA 12.4/13.3, Vulkan, OpenVINO, SYCL, and HIP.
- Android arm64 (CPU) and UI binaries are also provided for this release.