The SWE-rebench leaderboard has been updated with new model entries and a redesigned user interface to facilitate easier comparison of results.

  • Claude Opus 4.8 xhigh leads with 56.5% resolution using 2.48M tokens.
  • GLM-5.2 achieves 51.1% with 2.62M tokens.
  • Gemini 3.5 Flash scores 49.5% using 1.85M tokens.
  • MiniMax M3 reaches 45.6% with 6.89M tokens.
  • DeepSeek-V4 Pro attains 42.7% using 2.25M tokens.
  • MiMo V2.5 Pro scores 42.4% with 2.59M tokens.
  • DeepSeek-V4 Flash achieves 38.4% using 3.00M tokens.
  • Qwen3.6-27B reaches 36.5% with 1.88M tokens.
  • Qwen3.6-35B-A3B scores 33.8% using 2.23M tokens.
  • Gemma 4 31B achieves 16.5% with 2.24M tokens.

The update highlights local and self-hosted models, noting Qwen3.6-27B as particularly strong for its size.