Improving Disfluency Detection by Self-Training a Self-Attentive Model

Lou, Paria Jamshid; Johnson, Mark

Computer Science > Computation and Language

arXiv:2004.05323 (cs)

[Submitted on 11 Apr 2020 (v1), last revised 29 Apr 2020 (this version, v2)]

Title:Improving Disfluency Detection by Self-Training a Self-Attentive Model

Authors:Paria Jamshid Lou, Mark Johnson

View PDF

Abstract:Self-attentive neural syntactic parsers using contextualized word embeddings (e.g. ELMo or BERT) currently produce state-of-the-art results in joint parsing and disfluency detection in speech transcripts. Since the contextualized word embeddings are pre-trained on a large amount of unlabeled data, using additional unlabeled data to train a neural model might seem redundant. However, we show that self-training - a semi-supervised technique for incorporating unlabeled data - sets a new state-of-the-art for the self-attentive parser on disfluency detection, demonstrating that self-training provides benefits orthogonal to the pre-trained contextualized word representations. We also show that ensembling self-trained parsers provides further gains for disfluency detection.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2004.05323 [cs.CL]
	(or arXiv:2004.05323v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2004.05323

Submission history

From: Paria Jamshid Lou [view email]
[v1] Sat, 11 Apr 2020 06:53:08 UTC (87 KB)
[v2] Wed, 29 Apr 2020 06:44:14 UTC (106 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-04

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Paria Jamshid Lou
Mark Johnson

export BibTeX citation

Computer Science > Computation and Language

Title:Improving Disfluency Detection by Self-Training a Self-Attentive Model

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improving Disfluency Detection by Self-Training a Self-Attentive Model

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators