Benchmark · agentic
SWE-rebench
Continuously refreshed SWE-bench variant to reduce contamination.
- 2026-07-05 Claude Opus 4.8 xhigh 56.5% Ranking SWE-rebench adiciona GLM-5.2, Qwen3.6, Gemma 4 e melhora a interface
- 2026-07-05 GLM-5.2 51.1% Ranking SWE-rebench adiciona GLM-5.2, Qwen3.6, Gemma 4 e melhora a interface
- 2026-07-05 Gemini 3.5 Flash 49.5% Ranking SWE-rebench adiciona GLM-5.2, Qwen3.6, Gemma 4 e melhora a interface
- 2026-07-05 MiniMax M3 45.6% Ranking SWE-rebench adiciona GLM-5.2, Qwen3.6, Gemma 4 e melhora a interface
- 2026-07-05 DeepSeek-V4 Pro 42.7% Ranking SWE-rebench adiciona GLM-5.2, Qwen3.6, Gemma 4 e melhora a interface
- 2026-07-05 MiMo V2.5 Pro 42.4% Ranking SWE-rebench adiciona GLM-5.2, Qwen3.6, Gemma 4 e melhora a interface
- 2026-07-05 DeepSeek-V4 Flash 38.4% Ranking SWE-rebench adiciona GLM-5.2, Qwen3.6, Gemma 4 e melhora a interface
- 2026-07-05 Qwen3.6-27B 36.5% Ranking SWE-rebench adiciona GLM-5.2, Qwen3.6, Gemma 4 e melhora a interface
- 2026-07-05 Qwen3.6-35B-A3B 33.8% Ranking SWE-rebench adiciona GLM-5.2, Qwen3.6, Gemma 4 e melhora a interface
- 2026-07-05 Gemma 4 31B 16.5% Ranking SWE-rebench adiciona GLM-5.2, Qwen3.6, Gemma 4 e melhora a interface