Optimal Transport for Fairness: Archival Data Repair using Small Research Data Sets

Langbridge, Abigail; Quinn, Anthony; Shorten, Robert

Computer Science > Machine Learning

arXiv:2403.13864 (cs)

[Submitted on 20 Mar 2024]

Title:Optimal Transport for Fairness: Archival Data Repair using Small Research Data Sets

Authors:Abigail Langbridge, Anthony Quinn, Robert Shorten

View PDF HTML (experimental)

Abstract:With the advent of the AI Act and other regulations, there is now an urgent need for algorithms that repair unfairness in training data. In this paper, we define fairness in terms of conditional independence between protected attributes ($S$) and features ($X$), given unprotected attributes ($U$). We address the important setting in which torrents of archival data need to be repaired, using only a small proportion of these data, which are $S|U$-labelled (the research data). We use the latter to design optimal transport (OT)-based repair plans on interpolated supports. This allows {\em off-sample}, labelled, archival data to be repaired, subject to stationarity assumptions. It also significantly reduces the size of the supports of the OT plans, with correspondingly large savings in the cost of their design and of their {\em sequential\/} application to the off-sample data. We provide detailed experimental results with simulated and benchmark real data (the Adult data set). Our performance figures demonstrate effective repair -- in the sense of quenching conditional dependence -- of large quantities of off-sample, labelled (archival) data.

Subjects:	Machine Learning (cs.LG); Computers and Society (cs.CY); Statistics Theory (math.ST)
Cite as:	arXiv:2403.13864 [cs.LG]
	(or arXiv:2403.13864v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.13864

Submission history

From: Abigail Langbridge [view email]
[v1] Wed, 20 Mar 2024 09:23:20 UTC (364 KB)

Computer Science > Machine Learning

Title:Optimal Transport for Fairness: Archival Data Repair using Small Research Data Sets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Optimal Transport for Fairness: Archival Data Repair using Small Research Data Sets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators