IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach
Hansjörg HOFMANNSakriani SAKTIChiori HORIHideki KASHIOKASatoshi NAKAMURAWolfgang MINKER
Author information
JOURNAL FREE ACCESS

2012 Volume E95.D Issue 8 Pages 2084-2093

Details
Abstract

The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.

Content from these authors
© 2012 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top