The llama.cpp project has released version b9874, which introduces a new CUDA implementation for concatenating quantized types. This update is part of the ongoing development of the library's backend capabilities.
- The primary code change involves adding CUDA support for concatenating quantized data types.
- The release includes binaries for macOS (Apple Silicon and Intel), Linux (CPU, Vulkan, ROCm, OpenVINO, SYCL), Android, Windows (CPU, OpenCL, CUDA 12/13, Vulkan, OpenVINO, SYCL, HIP), and openEuler.
- An iOS XCFramework and a standalone UI build are also provided in this release.
This update allows users to access the latest features and platform support available in the b9874 build.