GLM-5.2 surpasses GPT-5.4 and the entire Gemini lineup in coding performance on the DeepSWE benchmark. However, it requires significantly more output tokens, making it substantially less efficient in terms of cost-per-task compared to models like GPT-5.5 and Claude Opus 4.8.