The llama.cpp project has released version b9867, introducing support for the spec-draft-p-min specification within DFlash. This update includes code changes to add an n_min guard and ensure both n_min and n_max are properly guarded.
- Support for spec-draft-p-min in DFlash via pull request #25246.
- Addition of an n_min guard in the dflash module.
- Guarding of both n_min and n_max parameters.
- macOS Apple Silicon (arm64) binaries provided, while KleidiAI support is disabled.
- Linux builds available for Ubuntu x64/arm64/s390x with CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL backends.
- Android arm64 (CPU) binaries released.
- Windows x64/arm64 builds support CPU, OpenCL Adreno, CUDA 12/13, Vulkan, OpenVINO, SYCL, and HIP.
- openEuler x86 and aarch64 builds for 310p and 910b (ACL Graph) are available.
This release provides updated binaries across multiple platforms and hardware accelerators, enabling users to run llama.cpp with the new DFlash specification support.