Benchmark · agentic

Terminal-Bench

Real-world terminal/CLI agent tasks.

4 results 4 models
0 21.5 43 64.5 86 2026-06-16 2026-06-20 2026-06-23 GLM-5.2 · 80 · 2026-06-16 Qwen3.6-27b-FP8 · 55.7 · 2026-06-20 Tmax · 27 · 2026-06-23 Tmax-27B · 43 · 2026-06-23
GLM-5.2 Qwen3.6-27b-FP8 Tmax Tmax-27B
Timeline
  1. 2026-06-23 Tmax-27B 43.0% Tmax-27B Terminal Agent for Small GPUs with DPPO Training
  2. 2026-06-23 Tmax 27.0% Tmax: A Simple RL Recipe for Terminal Agents
  3. 2026-06-20 Qwen3.6-27b-FP8 55.67tok/s $1800 GPU cost runs Qwen3.6-27B with 262K context and 55 tok/s
  4. 2026-06-16 GLM-5.2 80.0% GLM-5.2 crosses 80% on Terminal-Bench