The llama.cpp project has released version b9847, which includes a fix for Gemma E4B MTP FlashAttention on CUDA and the removal of an unused template declaration.
- Fixes Gemma E4B MTP FlashAttention in CUDA backend (#25148)
- Removes unused template declaration
- macOS Apple Silicon (arm64) binaries available
- macOS Intel (x64) binaries available
- iOS XCFramework provided
- Ubuntu x64 and arm64 CPU builds included
- Ubuntu Vulkan, ROCm 7.2, OpenVINO, SYCL FP32, and SYCL FP16 builds available
- Android arm64 CPU build released
- Windows x64 and arm64 CPU builds provided
- Windows CUDA 12.4 and 13.3 builds with DLLs included
- Windows Vulkan, OpenVINO, SYCL, and HIP builds available
- openEuler x86 and aarch64 builds for 310p and 910b (ACL Graph) processors
- General UI binary released