Google Scholar

Pushing the limits of non-autoregressive speech recognition

EG Ng, CC Chiu, Y Zhang, W Chan - arXiv preprint arXiv:2104.03416, 2021 - arxiv.org

arXiv preprint arXiv:2104.03416, 2021•arxiv.org

We combine recent advancements in end-to-end speech recognition to non-autoregressive
automatic speech recognition. We push the limits of non-autoregressive state-of-the-art
results for multiple datasets: LibriSpeech, Fisher+ Switchboard and Wall Street Journal. Key
to our recipe, we leverage CTC on giant Conformer neural network architectures with
SpecAugment and wav2vec2 pre-training. We achieve 1.8%/3.6% WER on LibriSpeech
test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal …

We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without a language model.

arxiv.org

Show moreShow less

Save Cite Cited by 34 Related articles All 5 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Pushing the limits of non-autoregressive speech recognition