GLM 5.2 delivers improved prefill speeds exceeding 100 t/s at higher context lengths. The update reduces memory usage, enabling 4-bit quantized models to handle over 100k context tokens efficiently. This enhancement is detailed in a PR by the oMLX creator.
GLM 5.2 on Mac Studio Speedup PR
from English