Don't Throw it Away! The Utility of Unlabeled Data in Fair Decision Making

Rateike, Miriam; Majumdar, Ayan; Mineeva, Olga; Gummadi, Krishna P.; Valera, Isabel

doi:10.1145/3531146.3533199

Statistics > Machine Learning

arXiv:2205.04790 (stat)

[Submitted on 10 May 2022 (v1), last revised 4 Jul 2022 (this version, v3)]

Title:Don't Throw it Away! The Utility of Unlabeled Data in Fair Decision Making

Authors:Miriam Rateike, Ayan Majumdar, Olga Mineeva, Krishna P. Gummadi, Isabel Valera

View PDF

Abstract:Decision making algorithms, in practice, are often trained on data that exhibits a variety of biases. Decision-makers often aim to take decisions based on some ground-truth target that is assumed or expected to be unbiased, i.e., equally distributed across socially salient groups. In many practical settings, the ground-truth cannot be directly observed, and instead, we have to rely on a biased proxy measure of the ground-truth, i.e., biased labels, in the data. In addition, data is often selectively labeled, i.e., even the biased labels are only observed for a small fraction of the data that received a positive decision. To overcome label and selection biases, recent work proposes to learn stochastic, exploring decision policies via i) online training of new policies at each time-step and ii) enforcing fairness as a constraint on performance. However, the existing approach uses only labeled data, disregarding a large amount of unlabeled data, and thereby suffers from high instability and variance in the learned decision policies at different times. In this paper, we propose a novel method based on a variational autoencoder for practical fair decision-making. Our method learns an unbiased data representation leveraging both labeled and unlabeled data and uses the representations to learn a policy in an online process. Using synthetic data, we empirically validate that our method converges to the optimal (fair) policy according to the ground-truth with low variance. In real-world experiments, we further show that our training approach not only offers a more stable learning process but also yields policies with higher fairness as well as utility than previous approaches.

Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Cite as:	arXiv:2205.04790 [stat.ML]
	(or arXiv:2205.04790v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2205.04790
Related DOI:	https://doi.org/10.1145/3531146.3533199

Submission history

From: Miriam Rateike [view email]
[v1] Tue, 10 May 2022 10:33:11 UTC (14,754 KB)
[v2] Wed, 11 May 2022 14:06:55 UTC (14,771 KB)
[v3] Mon, 4 Jul 2022 07:56:52 UTC (16,232 KB)

Statistics > Machine Learning

Title:Don't Throw it Away! The Utility of Unlabeled Data in Fair Decision Making

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Don't Throw it Away! The Utility of Unlabeled Data in Fair Decision Making

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators