llama.cpp users can free up GPU memory by disabling mmproj offload, using reduced KV cache types, and adjusting spec-draft-n-max. Parameters like --ctx-checkpoints and --fit-target have minimal impact, while --parallel helps in multi-user setups but not for single users.