Upgraded my budget build to multi-GPU for inference
A user upgraded a budget PC with two RTX 3090s and an Intel Arc A770 to test multi-GPU inference performance using llama.cpp. The primary finding is that the Vulkan backend causes excessive memory overhead compared to CUDA, making it unsuitable for mixed-vendor setups.