The llama.cpp b9850 release introduces specific model support updates, including registering the t_layer_inp tensor for Qwen3Next, fixing input assignment in the layer processing loop, and addressing DFLASH issues for qwen-coder-next. It also adds a tensor for attention normalization in the Qwen3 model.

  • macOS Apple Silicon (arm64) binaries are available, while KleidiAI support is disabled.
  • Linux builds cover Ubuntu x64 and arm64 CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL FP32/FP16 variants.
  • Android arm64 CPU binaries are provided for mobile devices.
  • Windows releases include CPU, OpenCL Adreno, CUDA 12.4/13.3, Vulkan, OpenVINO, SYCL, and HIP backends.
  • openEuler builds for x86 and aarch64 architectures with ACL Graph support are included.

This update provides users with corrected model handling for Qwen3 series models and expanded hardware acceleration options across multiple operating systems and GPU architectures.