GLM 5.2 Q1_S vs Qwen 27B Q8: A Local LLM Comparison

An amateur comparison on consumer hardware demonstrates that the heavily quantized GLM-5.2 (Q1_S) outperforms the higher-bit Qwen 3.6 27B (Q8) in a complex coding task, despite significantly slower inference speeds.

The test involved building a self-contained Three.js 3D game on dual RTX 3090s using the pi harness.
Qwen 27B generated code in ~2 minutes but required multiple follow-up prompts to become playable.
GLM 5.2 Q1_S took hours and 75k tokens but produced a correct, polished product with sound in a single shot.
LLM judges (Opus 4.8 and GPT 5.5) rated GLM Q1_S highest for code quality and instruction following.
Full precision GLM completed the task in only 11k tokens but contained control key inversion errors absent in the quantized version.

The results suggest that low-bit quantization can remain highly capable for specific use cases like complex reasoning tasks, provided the model's extended thinking process is utilized effectively.