korshunov
.ai
English
Today
This week
All articles
Benchmark · reasoning
GPQA Diamond
1 results
1 models
0
0
0
0
0
2026-06-24
30B model · 0 · 2026-06-24
30B model
Timeline
2026-06-24
30B model
0.0%
CALIBER: Calibrating Confidence Before and After Reasoning in Language Models