The llama.cpp project has released version b9837, which introduces a new `--reasoning-preserve` flag for the Jinja chat template to retain reasoning tokens. This update also includes corrected help messages and provides pre-built binaries for macOS, Linux, Windows, Android, and openEuler across various hardware backends.
- Added `--reasoning-preserve` flag to jinja and chat templates to preserve reasoning content.
- Corrected the help message text within the codebase.
- Disabled KleidiAI support for macOS Apple Silicon builds.
- Released binaries for Ubuntu (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (CPU, CUDA 12/13, Vulkan, OpenVINO, SYCL, HIP), and macOS (Apple Silicon and Intel).
This release allows users to access the latest llama.cpp features on a wide range of platforms and hardware accelerators while providing specific control over reasoning token handling in chat templates.