llama.cpp version b9673 introduces optional USM system allocations for GPU buffers ≥1GB, enabling VRAM overcommit when device support is available. The feature requires GGML_SYCL_USM_SYSTEM environment variable and is disabled by default, falling back to regular allocations if unsupported.
llama.cpp releases b9673 with USM system allocations and cross-platform binaries
from English