SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding

Chung, Yu-An; Zhu, Chenguang; Zeng, Michael

Computer Science > Computation and Language

arXiv:2010.02295 (cs)

[Submitted on 5 Oct 2020 (v1), last revised 15 Mar 2021 (this version, v3)]

Title:SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding

Authors:Yu-An Chung, Chenguang Zhu, Michael Zeng

View PDF

Abstract:Spoken language understanding (SLU) requires a model to analyze input acoustic signal to understand its linguistic content and make predictions. To boost the models' performance, various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text. However, the inherent disparities between the two modalities necessitate a mutual analysis. In this paper, we propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules. Besides conducting a self-supervised masked language modeling task on the two individual modules using unpaired speech and text, SPLAT aligns representations from the two modules in a shared latent space using a small amount of paired speech and text. Thus, during fine-tuning, the speech module alone can produce representations carrying both acoustic information and contextual semantic knowledge of an input acoustic signal. Experimental results verify the effectiveness of our approach on various SLU tasks. For example, SPLAT improves the previous state-of-the-art performance on the Spoken SQuAD dataset by more than 10%.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2010.02295 [cs.CL]
	(or arXiv:2010.02295v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.02295
Journal reference:	North American Chapter of the Association for Computational Linguistics (NAACL), Mexico City, Mexico, 2021. North American Chapter of the Association for Computational Linguistics (NAACL), Mexico City, Mexico, 2021

Submission history

From: Chenguang Zhu [view email]
[v1] Mon, 5 Oct 2020 19:29:49 UTC (995 KB)
[v2] Fri, 12 Mar 2021 01:38:07 UTC (1,015 KB)
[v3] Mon, 15 Mar 2021 00:55:04 UTC (1,015 KB)

Computer Science > Computation and Language

Title:SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators