User seeks memory usage data for large MoE models on future high-RAM rigs

A Reddit user is requesting specific memory consumption data for large Mixture of Experts (MoE) models to plan a future hardware build with 256GB or 512GB DRAM and 48GB VRAM. The user aims to download models now in formats like 16-bit safetensors or GGUFs, but needs to know exact sizes for various quantizations (Q2, Q3, Q4) to avoid storage miscalculations.

Specific interest in memory usage with unquantized KV cache for GLM5.2, Kimi K2.x, DeepSeekV3.2, V4, Mimo, Qwen 397b, MiniMax M3, and MiniMax M2.x.
Comparison of quantization formats like IQ4_XS, Q4_K_S, Q4_K_M, and IQ3_XXS for compatibility with llama.cpp, LMStudio, vLLM, SGLang, and Kobold.
Inquiry into Linux kernel limits for memory usage on rigs with large DRAM but limited VRAM, specifically regarding stability near 90-100% capacity.

The user wants real-world data to determine which quant sizes fit within their target memory constraints without causing out-of-memory errors or instability.