Benchmark · agentic

SWE-rebench

Continuously refreshed SWE-bench variant to reduce contamination.

10 results 10 models

Claude Opus 4.8 xhigh GLM-5.2 Gemini 3.5 Flash MiniMax M3 DeepSeek-V4 Pro MiMo V2.5 Pro DeepSeek-V4 Flash Qwen3.6-27B Qwen3.6-35B-A3B Gemma 4 31B

Timeline

2026-07-05 Claude Opus 4.8 xhigh 56.5% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
2026-07-05 GLM-5.2 51.1% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
2026-07-05 Gemini 3.5 Flash 49.5% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
2026-07-05 MiniMax M3 45.6% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
2026-07-05 DeepSeek-V4 Pro 42.7% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
2026-07-05 MiMo V2.5 Pro 42.4% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
2026-07-05 DeepSeek-V4 Flash 38.4% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
2026-07-05 Qwen3.6-27B 36.5% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
2026-07-05 Qwen3.6-35B-A3B 33.8% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
2026-07-05 Gemma 4 31B 16.5% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI