SWE-rebench leaderboard adds GLM-5.2, Qwen3.6, Gemma 4 and improves UI

The SWE-rebench leaderboard has been updated with new model entries and a redesigned user interface to facilitate easier comparison of results.

The update highlights local and self-hosted models, noting Qwen3.6-27B as particularly strong for its size.

Benchmarks

Benchmark

Model

Score

SWE-rebench

Claude Opus 4.8 xhigh

56.5%

SWE-rebench

GLM-5.2

51.1%

SWE-rebench

Gemini 3.5 Flash

49.5%

SWE-rebench

MiniMax M3

45.6%

SWE-rebench

DeepSeek-V4 Pro

42.7%

SWE-rebench

MiMo V2.5 Pro

42.4%

SWE-rebench

DeepSeek-V4 Flash

38.4%

SWE-rebench

Qwen3.6-27B

36.5%

SWE-rebench

Qwen3.6-35B-A3B

33.8%

SWE-rebench

Gemma 4 31B

16.5%