The llama.cpp project has released version b9852, introducing initial OpenCL support for the q1_0 quantization format. This update includes general q1_0 capabilities and specific Adreno GEMM/GEMV implementations for OpenCL devices.
- Initial OpenCL support for q1_0 quantization
- Added Adreno GEMM/GEMV kernels for q1_0
- macOS Apple Silicon (arm64) binaries provided
- KleidiAI on macOS Apple Silicon is disabled in this release
- Ubuntu builds available for CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL
- Windows builds include CUDA 12/13, Vulkan, OpenVINO, SYCL, HIP, and OpenCL Adreno
- Android arm64 (CPU) binaries released
- openEuler support for x86 and aarch64 architectures with ACL Graph enabled
This release expands hardware compatibility by enabling efficient q1_0 inference on OpenCL devices and provides updated binaries across multiple operating systems and accelerators.