A user on r/LocalLLaMA is considering upgrading their hardware setup from two RTX 3090 GPUs to four RTX 5070 Ti cards, specifically evaluating the performance implications for single-stream inference.
- The proposed configuration utilizes an Asus Proart Creator B850 Neo motherboard with a PCIe 5.0 4x/4x/4x/4x lane distribution.
- Occupying both primary x16 slots splits the CPU's 16 lanes into PCIe 5.0 x8/x8 mode, while two M.2 slots receive dedicated full-speed connections.
- The user seeks community feedback on performance for Qwen 3.6 27b using base 4-bit weights and an 8-bit KV-Cache setup.
The discussion highlights skepticism toward Google's conservative predictions that PCIe lanes will bottleneck inference speeds, noting a previous instance where actual speed increases significantly exceeded online estimates.