The Hidden Cost of Resampling: How Imbalance Correction Degrades Probability Calibration in Tree Ensembles

This study evaluates the impact of resampling methods like SMOTE and random undersampling on probability calibration in tree ensembles, finding that while SMOTE's cost is small, undersampling severely degrades calibration.

SMOTE causes a minor increase in Expected Calibration Error (ECE) of +0.009 across imbalance ratios of 1.9 to 70, with discrimination gains typically outweighing this penalty.
Random undersampling significantly inflates ECE up to 0.395 on high-imbalance datasets because the resulting training sets are too small for reliable probability estimation.
A single post-hoc recalibration step using Platt or isotonic regression reduces ECE by up to 66% with negligible loss in ranking power (AUC -0.002).
Analytic prior-shift correction fails for SMOTE because it distorts class-conditional density rather than just the prior, necessitating data-driven recalibration.

The authors recommend that imbalanced-learning studies report calibration metrics alongside discrimination and advise practitioners to recalibrate after resampling whenever predicted probabilities drive decisions.