A user reported that removing the GGML_CUDA_ALLREDUCE environment variable led to a noticeable improvement in throughput (TPS) for MTP in local LLM inference. The change, which was previously considered beneficial, unexpectedly reduced overhead and improved performance, especially after extensive configuration trials.
Finally seeing benefits of MTP after removing GGML_CUDA_ALLREDUCE
from English