The llama.cpp b9864 release introduces a change to the server's Server-Sent Events (SSE) handling, allowing the ping interval to be configured on a per-request basis. This update ensures that slow prefill operations do not drop healthy connections by pinging silent streams every 1 second and kicking them only after 3 seconds.

  • The global default for sse_ping_interval returns to 30, maintaining API client behavior while the WebUI sends sse_ping_interval: 1 in the request body.
  • The field is now a typed field_num with hard limits (-1, INT32_MAX) bound to task_params, providing free type and range validation.
  • macOS builds include Apple Silicon (arm64), Intel (x64), and iOS XCFramework, with KleidiAI disabled.
  • Linux binaries are available for Ubuntu x64 and arm64 (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16).
  • Windows supports CPU, OpenCL Adreno, CUDA 12/13, Vulkan, OpenVINO, SYCL, and HIP.
  • Android arm64 (CPU) and openEuler x86/aarch64 (ACL Graph) builds are also provided.

This update helps users by preventing connection drops during slow prefill phases while allowing the WebUI to declare its specific visibility-kick cadence needs.