U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

Wu, Di; Zhang, Binbin; Yang, Chao; Peng, Zhendong; Xia, Wenjing; Chen, Xiaoyu; Lei, Xin

Computer Science > Sound

arXiv:2106.05642 (cs)

[Submitted on 10 Jun 2021 (v1), last revised 30 Dec 2021 (this version, v3)]

Title:U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

Authors:Di Wu, Binbin Zhang, Chao Yang, Zhendong Peng, Wenjing Xia, Xiaoyu Chen, Xin Lei

View PDF

Abstract:The unified streaming and non-streaming two-pass (U2) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy, real-time factor (RTF), and latency. In this paper, we present U2++, an enhanced version of U2 to further improve the accuracy. The core idea of U2++ is to use the forward and the backward information of the labeling sequences at the same time at training to learn richer information, and combine the forward and backward prediction at decoding to give more accurate recognition results. We also proposed a new data augmentation method called SpecSub to help the U2++ model to be more accurate and robust. Our experiments show that, compared with U2, U2++ shows faster convergence at training, better robustness to the decoding method, as well as consistent 5\% - 8\% word error rate reduction gain over U2. On the experiment of AISHELL-1, we achieve a 4.63\% character error rate (CER) with a non-streaming setup and 5.05\% with a streaming setup with 320ms latency by U2++. To the best of our knowledge, 5.05\% is the best-published streaming result on the AISHELL-1 test set.

Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2106.05642 [cs.SD]
	(or arXiv:2106.05642v3 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2106.05642

Submission history

From: Binbin Zhang [view email]
[v1] Thu, 10 Jun 2021 10:25:15 UTC (672 KB)
[v2] Wed, 7 Jul 2021 07:38:58 UTC (672 KB)
[v3] Thu, 30 Dec 2021 00:30:30 UTC (672 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.SD

< prev | next >

new | recent | 2021-06

Change to browse by:

cs
cs.CL
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Di Wu
Binbin Zhang
Chao Yang
Xiaoyu Chen
Xin Lei

export BibTeX citation

Computer Science > Sound

Title:U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators