A user compiled llamacpp with both CUDA and Vulkan support to leverage two GPUs, the w7800 and another card. The setup achieved +10% tokens/sec in decoding for a MiniMax-M3-UD-IQ2_M-00001-of-00004.gguf model, with plans to run benchmarks to assess real performance gains.