On the Relationship between Data Efficiency and Error for Uncertainty Sampling

Mussmann, Stephen; Liang, Percy

Computer Science > Machine Learning

arXiv:1806.06123 (cs)

[Submitted on 15 Jun 2018]

Title:On the Relationship between Data Efficiency and Error for Uncertainty Sampling

Authors:Stephen Mussmann, Percy Liang

View PDF

Abstract:While active learning offers potential cost savings, the actual data efficiency---the reduction in amount of labeled data needed to obtain the same error rate---observed in practice is mixed. This paper poses a basic question: when is active learning actually helpful? We provide an answer for logistic regression with the popular active learning algorithm, uncertainty sampling. Empirically, on 21 datasets from OpenML, we find a strong inverse correlation between data efficiency and the error rate of the final classifier. Theoretically, we show that for a variant of uncertainty sampling, the asymptotic data efficiency is within a constant factor of the inverse error rate of the limiting classifier.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1806.06123 [cs.LG]
	(or arXiv:1806.06123v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1806.06123

Submission history

From: Stephen Mussmann [view email]
[v1] Fri, 15 Jun 2018 20:47:50 UTC (523 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2018-06

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Stephen Mussmann
Percy Liang

export BibTeX citation

Computer Science > Machine Learning

Title:On the Relationship between Data Efficiency and Error for Uncertainty Sampling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the Relationship between Data Efficiency and Error for Uncertainty Sampling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators