A user runs GLM-5.2 locally on a Dell PowerEdge R740 with dual Xeon 6248R CPUs and 768GB RAM, using ik_llama.cpp for improved CPU inference. After isolating one NUMA node for optimal performance, they achieve 4–5.5 tokens per second in chat and about 3 tokens per second in coding tasks, noting the model shows 'frontier vibes' during code generation despite limited usability on this hardware.
Running GLM-5.2 on CPU Only with Local Setup
from English