A speed test of GLM-5.2 quantized to UD-IQ1_M using llama.cpp shows 579 t/s prefill at 8k context and 324 t/s at 57k context. Decode speed remains steady at 10.6 t/s for over 580 tokens, dropping to 9.37 t/s at 60k context.