Demographic Metadata Harms DistilBERT Essay Scoring

A study finds that concatenating demographic metadata with text in DistilBERT-based essay scoring models degrades predictive accuracy and increases scoring bias. The experimental model achieved a lower Quadratic Weighted Kappa (0.656 vs. 0.727) and higher validation loss (1.29 vs. 1.25), with score parity dropping from 15 to 12 out of 19 tests.