Script to monitor llama cpp and analyze memory usage

A user has shared a Bash script designed to parse the verbose output of llama.cpp, providing a clear summary of VRAM/RAM requirements and runtime performance metrics. This tool addresses the difficulty of predicting memory needs for various model quantizations by grouping buffer allocations by function and backend.

Parses llama.cpp verbose logs to extract buffer sizes grouped by function and backend.
Displays key statistics including tokens per second (t/s), context size, and MTP acceptance rates.
Outputs data into separate TSV files for memory, stats, and model info.
Requires Linux and expects the llama.cpp command to be wrapped in a run.sh script with the -v flag.

The script helps users on commodity hardware better understand their system's resource usage and plan model deployments accordingly.