The llama.cpp b9859 release introduces the ability to load precompiled binary kernels from libraries for OpenCL, specifically targeting Adreno GPUs. This update also provides binaries for macOS, Linux, Windows, Android, and openEuler across CPU, GPU, and various accelerator backends.

  • Allows loading binary kernel libraries via ggml-backend-dl to resolve cyclic dependencies.
  • Loads specific kernels such as gemm_moe_mxfp4_f32_ns, q8_0, q4_0, q4_1, and q4_k moe gemm from the kernel library.
  • Always declares get_adreno_bin_kernel_func_t for OpenCL Adreno support.
  • macOS Apple Silicon with KleidiAI is disabled in this release.

This enhancement improves OpenCL performance by enabling precompiled binary kernels while maintaining compatibility across a wide range of hardware platforms.