A user has shared a Bash script designed to parse the verbose output of llama.cpp, providing a clear summary of VRAM/RAM requirements and runtime performance metrics. This tool addresses the difficulty of predicting memory needs for various model quantizations by grouping buffer allocations by function and backend.

  • Parses llama.cpp verbose logs to extract buffer sizes grouped by function and backend.
  • Displays key statistics including tokens per second (t/s), context size, and MTP acceptance rates.
  • Outputs data into separate TSV files for memory, stats, and model info.
  • Requires Linux and expects the llama.cpp command to be wrapped in a run.sh script with the -v flag.

The script helps users on commodity hardware better understand their system's resource usage and plan model deployments accordingly.