Home Lab: 4x Modded 4090s for Local LLM Inference

A user details a high-performance local inference setup utilizing four modified NVIDIA RTX 4090 GPUs with 192GB of VRAM, paired with a WRX90E-SAGE SE motherboard and 3000W power supply.

Hardware includes 128GB DDR5 RAM, a Pro WS WRX90E-SAGE SE motherboard, and a 3000W PSU connected to a 240V dryer line.
The system runs in a laundry room with automated exhaust triggered at 79°F to manage heat from the GPUs.
Use case involves a private Jarvis-class assistant with voice verification, long-term memory, and Home Assistant integration.
Gemma 4 31B QAT is identified as the top-performing model, with MiMo V2.5 showing promising speed despite minor looping issues.

The author notes that while the setup generates significant heat and noise, it effectively supports complex voice capabilities and continuous conversation features for a personal AI assistant.