The SWE-rebench leaderboard has been updated with new model entries and a redesigned user interface to facilitate easier comparison of results.
- Claude Opus 4.8 xhigh leads with 56.5% resolution using 2.48M tokens.
- GLM-5.2 achieves 51.1% with 2.62M tokens.
- Gemini 3.5 Flash scores 49.5% using 1.85M tokens.
- MiniMax M3 reaches 45.6% with 6.89M tokens.
- DeepSeek-V4 Pro attains 42.7% using 2.25M tokens.
- MiMo V2.5 Pro scores 42.4% with 2.59M tokens.
- DeepSeek-V4 Flash achieves 38.4% using 3.00M tokens.
- Qwen3.6-27B reaches 36.5% with 1.88M tokens.
- Qwen3.6-35B-A3B scores 33.8% using 2.23M tokens.
- Gemma 4 31B achieves 16.5% with 2.24M tokens.
The update highlights local and self-hosted models, noting Qwen3.6-27B as particularly strong for its size.