Cheapest way to run GLM 5.x locally without unified memory
A user explores cost-effective methods to run GLM 5.x locally using 4-bit quantization, such as IQ4_XS, without relying on unified memory. Options include CPU-only setups like Sapphire Rapids ES with DDR5, multi-GPU offloading, or similar-sized models. The user runs a 5900X + 128GB DDR4 + 7900XT 20GB system, successfully handling Minimax 2.7 at Q4_K_S and Qwen 3.6 27B at IQ4_XS.