The llama.cpp project has released version b9870, which includes a fix for long reasoning loops caused by the StepFun parser. The update moves message trimming logic ahead of rendering to properly handle content parts and whitespace.

  • Fixes long reasoning loops by trimming messages sent to the StepFun parser before rendering.
  • Applies trimming to content_parts text, string content, and reasoning_content.
  • Adds a regression test for content parts and removes a duplicate template.
  • Disables macOS Apple Silicon (arm64, KleidiAI enabled) builds.

This release provides updated binaries for macOS, Linux, Windows, Android, and openEuler across various CPU and GPU backends, ensuring the parser fix is available to users on supported platforms.