An audit of fourteen mainstream large language models reveals a significant shift in racial bias within resume screening algorithms over recent years. While 2023-vintage models reproduce pro-White callback gaps, all models released in 2024 or later show either null gaps or significant pro-Black reversals.
- The study audited 14 LLMs using the paired-resume methodology of Kline, Rose, and Walters (2022) across 24,024 postings per model.
- The sole 2023-vintage model reproduced a pro-White callback gap of +2.12 percentage points, significant at the 1% level.
- Every model released in 2024 or after showed either a null gap or a significant pro-Black reversal, with effects up to -3.01 percentage points.
- The same pattern of bias reversal holds true on the gender axis.
These results document a complete reversal in the direction of algorithmic hiring bias across model generations, indicating that newer models may actively counteract historical discrimination patterns rather than merely replicating them.