A study finds that concatenating demographic metadata with text in DistilBERT-based essay scoring models degrades predictive accuracy and increases scoring bias. The experimental model achieved a lower Quadratic Weighted Kappa (0.656 vs. 0.727) and higher validation loss (1.29 vs. 1.25), with score parity dropping from 15 to 12 out of 19 tests.
Demographic Metadata Harms DistilBERT Essay Scoring
from English