llama.cpp b9820 release: reduced CUDA syncs and new binaries
The llama.cpp b9820 release introduces performance improvements by reintroducing less synchronizations during split compute, specifically targeting CUDA backends. This update also provides pre-built binaries for macOS, Linux, Windows, Android, and openEuler across CPU, GPU, and specialized hardware accelerators.