The llama.cpp project released build b9871, which includes a fix for the broken CPU concatenation implementation when handling quantized data types.

  • The core change addresses a bug in ggml's CPU concat logic for quantized types, accompanied by new tests to verify correctness.
  • Pre-built binaries are available for macOS (Apple Silicon and Intel), Linux (Ubuntu x64/arm64/s390x with CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (CPU, CUDA 12/13, Vulkan, OpenCL, OpenVINO, SYCL, HIP), Android arm64, and openEuler.
  • An updated UI binary is also included in this release.