Users share that hf-accelerate's model-memory-usage and NyxKrage's LLM VRAM Calculator are common tools for estimating VRAM and RAM needs. The NyxKrage tool is noted for being KV-cache-aware and configurable with quantization and context length settings, though results may vary across models and engines like llama.cpp or vLLM due to quantization and caching behaviors.
What tools do people use to estimate VRAM and RAM for local LLMs?
from English