A pull request to the llama.cpp repository introduces a change that utilizes hipBLAS for dense prefill operations on AMD gfx900 architecture GPUs. This modification is specifically targeted at legacy Vega GPU hardware, including models like the Radeon RX Vega 56/64 and Radeon Pro Vega series.

  • Performance gains of approximately 40% on average across tested models.
  • Qwen3.5 4B sees a 36.1% increase in performance.
  • Qwen3.6 27B shows an 18.9% improvement.
  • Gemma4 12B achieves a significant 65.1% boost.

This update provides substantial speed improvements for users running older AMD Vega hardware, addressing performance limitations on this specific architecture.