media r/LocalLLaMA · 8d ago · open_models

I didn't know it was possible to compile llamacpp to run CUDA + Vulkan at the same time

from English

A user compiled llamacpp with both CUDA and Vulkan support to leverage two GPUs, the w7800 and another card. The setup achieved +10% tokens/sec in decoding for a MiniMax-M3-UD-IQ2_M-00001-of-00004.gguf model, with plans to run benchmarks to assess real performance gains.

Importance 1/3 r/LocalLLaMA Code generation Inference efficiency Open weights

Read original