The llama.cpp project released build b9871, which includes a fix for the broken CPU concatenation implementation when handling quantized data types.
- The core change addresses a bug in ggml's CPU concat logic for quantized types, accompanied by new tests to verify correctness.
- Pre-built binaries are available for macOS (Apple Silicon and Intel), Linux (Ubuntu x64/arm64/s390x with CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (CPU, CUDA 12/13, Vulkan, OpenCL, OpenVINO, SYCL, HIP), Android arm64, and openEuler.
- An updated UI binary is also included in this release.