Benchmark · agentic
SWE-rebench
Continuously refreshed SWE-bench variant to reduce contamination.
- 2026-07-05 Claude Opus 4.8 xhigh 56.5% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
- 2026-07-05 GLM-5.2 51.1% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
- 2026-07-05 Gemini 3.5 Flash 49.5% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
- 2026-07-05 MiniMax M3 45.6% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
- 2026-07-05 DeepSeek-V4 Pro 42.7% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
- 2026-07-05 MiMo V2.5 Pro 42.4% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
- 2026-07-05 DeepSeek-V4 Flash 38.4% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
- 2026-07-05 Qwen3.6-27B 36.5% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
- 2026-07-05 Qwen3.6-35B-A3B 33.8% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI
- 2026-07-05 Gemma 4 31B 16.5% SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI