llama.cpp version b9714 adds the "X-Accel-Buffering": "no" header to streaming endpoints to prevent Nginx from buffering responses, which resolves streaming issues with applications like the Pi coding harness. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options.
llama.cpp release b9714 adds X-Accel-Buffering header and new binaries
from English