The llama.cpp project has released version b9852, introducing initial OpenCL support for the q1_0 quantization format. This update includes general q1_0 capabilities and specific Adreno GEMM/GEMV implementations for OpenCL devices.

  • Initial OpenCL support for q1_0 quantization
  • Added Adreno GEMM/GEMV kernels for q1_0
  • macOS Apple Silicon (arm64) binaries provided
  • KleidiAI on macOS Apple Silicon is disabled in this release
  • Ubuntu builds available for CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL
  • Windows builds include CUDA 12/13, Vulkan, OpenVINO, SYCL, HIP, and OpenCL Adreno
  • Android arm64 (CPU) binaries released
  • openEuler support for x86 and aarch64 architectures with ACL Graph enabled

This release expands hardware compatibility by enabling efficient q1_0 inference on OpenCL devices and provides updated binaries across multiple operating systems and accelerators.