The llama.cpp project has released version b9848, which includes a critical fix for the CUDA backend to resolve issues with the `get_rows_back` function on tables exceeding 65535 rows. This update addresses grid-y clamping and stride errors that previously affected large table operations.
- Fixed CUDA `get_rows_back` for tables with more than 65535 rows by correcting grid-y clamp and stride logic (PR #25103).
- macOS Apple Silicon KleidiAI support is DISABLED in this release.
- openEuler standard builds are DISABLED, but specific builds for x86 (310p, 910b ACL Graph) and aarch64 (310p, 910b ACL Graph) remain available.
- Binaries are provided for macOS (Apple Silicon arm64 and Intel x64), Linux (Ubuntu CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android (arm64 CPU), Windows (CPU, OpenCL Adreno, CUDA 12/13, Vulkan, OpenVINO, SYCL, HIP), and the standalone UI.
This release ensures stability for CUDA users handling large data structures while offering comprehensive pre-built binaries across major operating systems and hardware accelerators.