Benchmark · agentic
Terminal-Bench
Real-world terminal/CLI agent tasks.
- 2026-06-23 Tmax-27B 43.0% Tmax-27B Terminal Agent for Small GPUs with DPPO Training
- 2026-06-23 Tmax 27.0% Tmax: A Simple RL Recipe for Terminal Agents
- 2026-06-20 Qwen3.6-27b-FP8 55.67tok/s $1800 GPU cost runs Qwen3.6-27B with 262K context and 55 tok/s
- 2026-06-16 GLM-5.2 80.0% GLM-5.2 crosses 80% on Terminal-Bench