The llama.cpp project has released version b9788, which introduces support for the --split-mode tensor option within its SYCL backend. This update specifically targets users running inference on Intel graphics processing units. The feature is implemented through pull request #24152 in the ggml-org repository. It enables the splitting of model tensors across multiple devices rather than relying solely on layer-based distribution. The release notes explicitly invite users with dual Intel GPU setups to test this new functionality. Contributors are encouraged to provide performance benchmarks to validate the improvements. This addition aims to enhance multi-GPU utilization for compatible Intel hardware configurations.
llama.cpp b9788 adds SYCL tensor split support for Intel GPUs
from English