A user reports that using tensor split mode in llama.cpp causes looping issues with tool calls and reasoning traces when running Qwen 27B and Gemma 4 26B (MoE) models across an RTX 5080 and two RTX 5060 Ti GPUs.
- The issue was observed specifically with the tensor split mode setting.
- Models tested include Qwen 27B and Gemma 4 26B (MoE).
- Hardware configuration involved one RTX 5080 and two RTX 5060 Ti cards.
- Layer split mode functioned correctly without these errors.
The author seeks to determine if this is a known issue or understand the underlying cause of the looping behavior.