media r/LocalLLaMA · 3d ago · open_models

GLM-5.2 UD-IQ1_M Speed Test on llama.cpp with 5090 and 3090 Ti

from English

A speed test of GLM-5.2 quantized to UD-IQ1_M using llama.cpp shows 579 t/s prefill at 8k context and 324 t/s at 57k context. Decode speed remains steady at 10.6 t/s for over 580 tokens, dropping to 9.37 t/s at 60k context.

Importance 1/3 r/LocalLLaMA Code generation Inference efficiency

Read original