Benchmark · coding

HumanEval+

saturated 2 results 1 models
0 3.5 7 10.5 14 2026-06-16 DeepSeek-Coder-1.3B · 12 · 2026-06-16 DeepSeek-Coder-1.3B · 12 · 2026-06-16
DeepSeek-Coder-1.3B
Timeline
  1. 2026-06-16 DeepSeek-Coder-1.3B 12.0tasks Post-Hoc Operators Fail to Improve Accuracy in Small Code Models
  2. 2026-06-16 DeepSeek-Coder-1.3B 12.0tasks Post-Hoc Falsification Operators Fail to Improve Accuracy in Small Code Models