The llama.cpp project released version b9873, which addresses a critical assertion failure occurring during graph operations where the key/value rotation buffer is unallocated.
- Fixes an abort caused by calling ggml_backend_buffer_is_host() on a NULL buffer when the tensor pointer is non-null but the buffer is unallocated.
- Adds a guard to check for buffer allocation before processing k_rot/v_rot inputs, consistent with existing checks for kq_mask inputs.
- Resolves issue #25191 related to DFlash speculative decoding's KV-injection pass.
This change prevents crashes in workflows that store K/V without attending, such as specific speculative decoding passes, ensuring stable execution.