The llama.cpp project released version b9873, which addresses a critical assertion failure occurring during graph operations where the key/value rotation buffer is unallocated.

  • Fixes an abort caused by calling ggml_backend_buffer_is_host() on a NULL buffer when the tensor pointer is non-null but the buffer is unallocated.
  • Adds a guard to check for buffer allocation before processing k_rot/v_rot inputs, consistent with existing checks for kq_mask inputs.
  • Resolves issue #25191 related to DFlash speculative decoding's KV-injection pass.

This change prevents crashes in workflows that store K/V without attending, such as specific speculative decoding passes, ensuring stable execution.