The article introduces FairBED, a framework that modifies the data acquisition process itself to gather inherently fairer data, addressing biases present in existing datasets. It provides novel formulations for quantifying dataset fairness based on the principle that fair datasets should be uninformative about sensitive attributes.
- FairBED constructs practical fairness-aware Bayesian experimental design objectives that maximize expected information gain about the target quantity while minimizing it for sensitive attributes.
- The authors derive a theoretical link between FairBED and demographic parity.
- Empirical results show that models trained on data gathered using FairBED provide improved fairness-accuracy trade-offs compared to randomly acquired data and conventional BED.
This approach helps users obtain data that is more suitable for training fair predictors by actively reducing bias during the collection phase rather than just correcting it in the model.