An amateur comparison on consumer hardware demonstrates that the heavily quantized GLM-5.2 (Q1_S) outperforms the higher-bit Qwen 3.6 27B (Q8) in a complex coding task, despite significantly slower inference speeds.

  • The test involved building a self-contained Three.js 3D game on dual RTX 3090s using the pi harness.
  • Qwen 27B generated code in ~2 minutes but required multiple follow-up prompts to become playable.
  • GLM 5.2 Q1_S took hours and 75k tokens but produced a correct, polished product with sound in a single shot.
  • LLM judges (Opus 4.8 and GPT 5.5) rated GLM Q1_S highest for code quality and instruction following.
  • Full precision GLM completed the task in only 11k tokens but contained control key inversion errors absent in the quantized version.

The results suggest that low-bit quantization can remain highly capable for specific use cases like complex reasoning tasks, provided the model's extended thinking process is utilized effectively.