This study evaluates the impact of resampling methods like SMOTE and random undersampling on probability calibration in tree ensembles, finding that while SMOTE's cost is small, undersampling severely degrades calibration.
- SMOTE causes a minor increase in Expected Calibration Error (ECE) of +0.009 across imbalance ratios of 1.9 to 70, with discrimination gains typically outweighing this penalty.
- Random undersampling significantly inflates ECE up to 0.395 on high-imbalance datasets because the resulting training sets are too small for reliable probability estimation.
- A single post-hoc recalibration step using Platt or isotonic regression reduces ECE by up to 66% with negligible loss in ranking power (AUC -0.002).
- Analytic prior-shift correction fails for SMOTE because it distorts class-conditional density rather than just the prior, necessitating data-driven recalibration.
The authors recommend that imbalanced-learning studies report calibration metrics alongside discrimination and advise practitioners to recalibrate after resampling whenever predicted probabilities drive decisions.