llama.cpp b9828 release: OpenCL Flash Attention improvements and new binaries
The llama.cpp b9828 release introduces significant OpenCL enhancements, specifically reworking the Flash Attention kernels for f16 and f32 precision. This update includes new prefill prepass kernels and support for q4_0 and q8_0 quantization formats.