CALIBER introduces a method that elicits and supervises confidence estimates at two stages: before and after reasoning. It reduces Expected Calibration Error by 52.5% on BigMathDigits for a 7B model, achieving the best Brier score and AUROC, and performs best on out-of-distribution benchmarks like GPQA and TriviaQA.