This study evaluates the impact of resampling methods like SMOTE and random undersampling on probability calibration in tree ensembles, finding that while SMOTE's cost is small, undersampling severely degrades calibration.

  • SMOTE causes a minor increase in Expected Calibration Error (ECE) of +0.009 across imbalance ratios of 1.9 to 70, with discrimination gains typically outweighing this penalty.
  • Random undersampling significantly inflates ECE up to 0.395 on high-imbalance datasets because the resulting training sets are too small for reliable probability estimation.
  • A single post-hoc recalibration step using Platt or isotonic regression reduces ECE by up to 66% with negligible loss in ranking power (AUC -0.002).
  • Analytic prior-shift correction fails for SMOTE because it distorts class-conditional density rather than just the prior, necessitating data-driven recalibration.

The authors recommend that imbalanced-learning studies report calibration metrics alongside discrimination and advise practitioners to recalibrate after resampling whenever predicted probabilities drive decisions.