Users reporting local GLM 5.2 inference speeds using llama.cpp on 6x RTX 3090 with 128GB DDR5 and an i7-13700K achieve 7.8 tokens/sec at 90K context size with Q8_0 quantization. Prompt processing occurs at approximately 40 tokens/sec.