Synthetic Dataset Generation for Fairer Unfairness Research
Proceedings of the 14th Learning Analytics and Knowledge Conference, 2024•dl.acm.org
Recent research has made strides toward fair machine learning. Relatively few datasets,
however, are commonly examined to evaluate these fairness-aware algorithms, and even
fewer in education domains, which can lead to a narrow focus on particular types of fairness
issues. In this paper, we describe a novel dataset modification method that utilizes a genetic
algorithm to induce many types of unfairness into datasets. Additionally, our method can
generate an unfairness benchmark dataset from scratch (thus avoiding data collection in …
however, are commonly examined to evaluate these fairness-aware algorithms, and even
fewer in education domains, which can lead to a narrow focus on particular types of fairness
issues. In this paper, we describe a novel dataset modification method that utilizes a genetic
algorithm to induce many types of unfairness into datasets. Additionally, our method can
generate an unfairness benchmark dataset from scratch (thus avoiding data collection in …
Recent research has made strides toward fair machine learning. Relatively few datasets, however, are commonly examined to evaluate these fairness-aware algorithms, and even fewer in education domains, which can lead to a narrow focus on particular types of fairness issues. In this paper, we describe a novel dataset modification method that utilizes a genetic algorithm to induce many types of unfairness into datasets. Additionally, our method can generate an unfairness benchmark dataset from scratch (thus avoiding data collection in situations that might exploit marginalized populations), or modify an existing dataset used as a reference point. Our method can increase the unfairness by 156.3% on average across datasets and unfairness definitions while preserving AUC scores for models trained on the original dataset (just 0.3% change, on average). We investigate the generalization of our method across educational datasets with different characteristics and evaluate three common unfairness mitigation algorithms. The results show that our method can generate datasets with different types of unfairness, large and small datasets, different types of features, and which affect models trained with different classifiers. Datasets generated with this method can be used for benchmarking and testing for future research on the measurement and mitigation of algorithmic unfairness.
ACM Digital Library
Showing the best result for this search. See all results