A speed test of GLM-5.2 quantized to UD-IQ1_M using llama.cpp shows 579 t/s prefill at 8k context and 324 t/s at 57k context. Decode speed remains steady at 10.6 t/s for over 580 tokens, dropping to 9.37 t/s at 60k context.
GLM-5.2 UD-IQ1_M Speed Test on llama.cpp with 5090 and 3090 Ti
from English