Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Wang, Xuefei; Long, Yanhua; Li, Yijie; Wei, Haoran

Computer Science > Sound

arXiv:2306.11309 (cs)

[Submitted on 20 Jun 2023]

Title:Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Authors:Xuefei Wang, Yanhua Long, Yijie Li, Haoran Wei

View PDF

Abstract:Low-resource accented speech recognition is one of the important challenges faced by current ASR technology in practical applications. In this study, we propose a Conformer-based architecture, called Aformer, to leverage both the acoustic information from large non-accented and limited accented training data. Specifically, a general encoder and an accent encoder are designed in the Aformer to extract complementary acoustic information. Moreover, we propose to train the Aformer in a multi-pass manner, and investigate three cross-information fusion methods to effectively combine the information from both general and accent encoders. All experiments are conducted on both the accented English and Mandarin ASR tasks. Results show that our proposed methods outperform the strong Conformer baseline by relative 10.2% to 24.5% word/character error rate reduction on six in-domain and out-of-domain accented test sets.

Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Cite as:	arXiv:2306.11309 [cs.SD]
	(or arXiv:2306.11309v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2306.11309

Submission history

From: Haoran Wei [view email]
[v1] Tue, 20 Jun 2023 06:08:09 UTC (524 KB)

Computer Science > Sound

Title:Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators