Denoising Multi-Source Weak Supervision for Neural Text Classification

Ren, Wendi; Li, Yinghao; Su, Hanting; Kartchner, David; Mitchell, Cassie; Zhang, Chao

doi:10.18653/v1/2020.findings-emnlp.334

Computer Science > Computation and Language

arXiv:2010.04582 (cs)

[Submitted on 9 Oct 2020]

Title:Denoising Multi-Source Weak Supervision for Neural Text Classification

Authors:Wendi Ren, Yinghao Li, Hanting Su, David Kartchner, Cassie Mitchell, Chao Zhang

View PDF

Abstract:We study the problem of learning neural text classifiers without using any labeled data, but only easy-to-provide rules as multiple weak supervision sources. This problem is challenging because rule-induced weak labels are often noisy and incomplete. To address these two challenges, we design a label denoiser, which estimates the source reliability using a conditional soft attention mechanism and then reduces label noise by aggregating rule-annotated weak labels. The denoised pseudo labels then supervise a neural classifier to predicts soft labels for unmatched samples, which address the rule coverage issue. We evaluate our model on five benchmarks for sentiment, topic, and relation classifications. The results show that our model outperforms state-of-the-art weakly-supervised and semi-supervised methods consistently, and achieves comparable performance with fully-supervised methods even without any labeled data. Our code can be found at this https URL.

Comments:	16 pages, 7 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2010.04582 [cs.CL]
	(or arXiv:2010.04582v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.04582
Related DOI:	https://doi.org/10.18653/v1/2020.findings-emnlp.334

Submission history

From: Wendi Ren [view email]
[v1] Fri, 9 Oct 2020 13:57:52 UTC (630 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

David Kartchner
Chao Zhang

export BibTeX citation

Computer Science > Computation and Language

Title:Denoising Multi-Source Weak Supervision for Neural Text Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Denoising Multi-Source Weak Supervision for Neural Text Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators