llama.cpp b9862 release: CUDA optimization and multi-platform binaries

The llama.cpp project has released version b9862, featuring a performance optimization for the gated_delta_net operation and providing pre-built binaries for macOS, Linux, Windows, Android, and openEuler.

Removes redundant CUDA copies after gated_delta_net by detecting the gated_delta_net -> view -> cpy pattern.
Allows the CUDA GDN kernel to write state snapshots directly into the recurrent cache, skipping intermediate tail writes.
Disables KleidiAI support for macOS Apple Silicon in this release.
Provides binaries for Ubuntu x64/arm64/s390x with CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL backends.
Includes Windows builds for CPU, OpenCL Adreno, CUDA 12/13, Vulkan, OpenVINO, SYCL, and HIP.

This update improves inference efficiency on supported GPU architectures while maintaining broad compatibility across various operating systems and hardware accelerators.