A Reddit user is requesting specific memory consumption data for large Mixture of Experts (MoE) models to plan a future hardware build with 256GB or 512GB DRAM and 48GB VRAM. The user aims to download models now in formats like 16-bit safetensors or GGUFs, but needs to know exact sizes for various quantizations (Q2, Q3, Q4) to avoid storage miscalculations.
- Specific interest in memory usage with unquantized KV cache for GLM5.2, Kimi K2.x, DeepSeekV3.2, V4, Mimo, Qwen 397b, MiniMax M3, and MiniMax M2.x.
- Comparison of quantization formats like IQ4_XS, Q4_K_S, Q4_K_M, and IQ3_XXS for compatibility with llama.cpp, LMStudio, vLLM, SGLang, and Kobold.
- Inquiry into Linux kernel limits for memory usage on rigs with large DRAM but limited VRAM, specifically regarding stability near 90-100% capacity.
The user wants real-world data to determine which quant sizes fit within their target memory constraints without causing out-of-memory errors or instability.