Tensor split performance on low-bandwidth (TB3) eGPUs, and a question
A user reports testing tensor split mode with two Morefine G1 4090M 16GB eGPUs connected via Thunderbolt 3 at 40Gbps. While layer split mode yields high token rates for prefill (PP) and text generation (TG), tensor split mode saturates both cards during TG but suffers from poor PP performance due to bandwidth saturation.