The llama.cpp b9864 release introduces a change to the server's Server-Sent Events (SSE) handling, allowing the ping interval to be configured on a per-request basis. This update ensures that slow prefill operations do not drop healthy connections by pinging silent streams every 1 second and kicking them only after 3 seconds.
- The global default for sse_ping_interval returns to 30, maintaining API client behavior while the WebUI sends sse_ping_interval: 1 in the request body.
- The field is now a typed field_num with hard limits (-1, INT32_MAX) bound to task_params, providing free type and range validation.
- macOS builds include Apple Silicon (arm64), Intel (x64), and iOS XCFramework, with KleidiAI disabled.
- Linux binaries are available for Ubuntu x64 and arm64 (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16).
- Windows supports CPU, OpenCL Adreno, CUDA 12/13, Vulkan, OpenVINO, SYCL, and HIP.
- Android arm64 (CPU) and openEuler x86/aarch64 (ACL Graph) builds are also provided.
This update helps users by preventing connection drops during slow prefill phases while allowing the WebUI to declare its specific visibility-kick cadence needs.