Developer Requests Testing for MTP Support in GLM-4.7-Flash via llama.cpp
A developer is seeking community assistance to test Multi-Token Prediction (MTP) support for the GLM-4.7-Flash model within the llama.cpp framework. The author acknowledges that previous models like GLM Air and GLM Flash are outdated but expresses a personal interest in enabling MTP for them. The request specifically targets users who possess the necessary hardware to run GLM-4.7-Flash and have the technical ability to compile llama.cpp from source. Participants are asked to evaluate the functionality of the provided GGUF model and report any encountered issues. Additionally, testers are requested to measure and share the performance speed gains achieved through MTP implementation. The developer has uploaded the test model to a Hugging Face repository for immediate access. Users requiring smaller quantization options are invited to contact the author directly for alternative versions.