Semi-supervised acoustic model training for five-lingual code-switched ASR

Biswas, Astik; Yılmaz, Emre; de Wet, Febe; van der Westhuizen, Ewald; Niesler, Thomas

Computer Science > Computation and Language

arXiv:1906.08647 (cs)

[Submitted on 20 Jun 2019 (v1), last revised 15 Oct 2019 (this version, v2)]

Title:Semi-supervised acoustic model training for five-lingual code-switched ASR

Authors:Astik Biswas, Emre Yılmaz, Febe de Wet, Ewald van der Westhuizen, Thomas Niesler

View PDF

Abstract:This paper presents recent progress in the acoustic modelling of under-resourced code-switched (CS) speech in multiple South African languages. We consider two approaches. The first constructs separate bilingual acoustic models corresponding to language pairs (English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho). The second constructs a single unified five-lingual acoustic model representing all the languages (English, isiZulu, isiXhosa, Setswana and Sesotho). For these two approaches we consider the effectiveness of semi-supervised training to increase the size of the very sparse acoustic training sets. Using approximately 11 hours of untranscribed speech, we show that both approaches benefit from semi-supervised training. The bilingual TDNN-F acoustic models also benefit from the addition of CNN layers (CNN-TDNN-F), while the five-lingual system does not show any significant improvement. Furthermore, because English is common to all language pairs in our data, it dominates when training a unified language model, leading to improved English ASR performance at the expense of the other languages. Nevertheless, the five-lingual model offers flexibility because it can process more than two languages simultaneously, and is therefore an attractive option as an automatic transcription system in a semi-supervised training pipeline.

Comments:	Accepted for publication at Interspeech 2019
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1906.08647 [cs.CL]
	(or arXiv:1906.08647v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1906.08647

Submission history

From: Emre Yilmaz [view email]
[v1] Thu, 20 Jun 2019 14:11:55 UTC (1,288 KB)
[v2] Tue, 15 Oct 2019 13:45:15 UTC (1,287 KB)

Computer Science > Computation and Language

Title:Semi-supervised acoustic model training for five-lingual code-switched ASR

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Semi-supervised acoustic model training for five-lingual code-switched ASR

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators