A user reports that using tensor split mode in llama.cpp causes looping issues with tool calls and reasoning traces when running Qwen 27B and Gemma 4 26B (MoE) models across an RTX 5080 and two RTX 5060 Ti GPUs.

  • The issue was observed specifically with the tensor split mode setting.
  • Models tested include Qwen 27B and Gemma 4 26B (MoE).
  • Hardware configuration involved one RTX 5080 and two RTX 5060 Ti cards.
  • Layer split mode functioned correctly without these errors.

The author seeks to determine if this is a known issue or understand the underlying cause of the looping behavior.