MoonMath AI has open-sourced a bf16 forward attention kernel for AMD's MI300X GPU, written in HIP rather than assembly. It outperforms AMD's own AITER v3 kernel across all tested shapes and rounding modes, with speedups up to 1.26x, and maintains bit-identical numerical accuracy.
MoonMath AI Open-Sources HIP Attention Kernel That Beats AITER v3 on MI300X
from English