GLM-5.2 surpasses GPT-5.4 and the entire Gemini lineup in coding performance on the DeepSWE benchmark. However, it requires significantly more output tokens, making it substantially less efficient in terms of cost-per-task compared to models like GPT-5.5 and Claude Opus 4.8.
media
r/LocalLLaMA
·
4d ago
·
open_models
GLM-5.2 Beats Gemini and GPT-5.4 in Coding but Is Inefficient
from English
Importance 2/3
Beats a top-lab benchmark
r/LocalLLaMA
Zhipu AI
Mistral AI
OpenAI
Code generation
Evaluation & benchmarks
Open weights
Benchmarks
| Benchmark | Model | Score |
|---|---|---|
| SWE-bench Verified | GLM-5.2 | 0% |