Google Scholar

[PDF][PDF] Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting.

Z Meng, BH Juang - INTERSPEECH, 2016 - isca-archive.org

Z Meng, BH Juang

INTERSPEECH, 2016•isca-archive.org

Large vocabulary continuous speech recognition (LVCSR) has achieved extraordinary performance when the speech is read or dictated. For instance, a word accuracy higher than 90% can be expected on the Wall Street Journal task. However, this performance decreases tremendously on a spontaneous conversational speech recognition task [2] as it consists of a stream of words with no overt lexical marking of punctuations and disfluencies (ie, filled pauses, repetitions, repairs and false starts) may occur frequently in a natural conversation [3]. However, in real applications, it is more important to semantically understand a spontaneous speech rather than to recognize its word transcription. Moreover, the semantic meaning generally resides in a set of keywords in the spoken utterances. Therefore, keyword spotting techniques become crucial for spontaneous conversational speech recognition tasks.

isca-archive.org

Show moreShow less

Save Cite Cited by 6 Related articles All 3 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

[PDF][PDF] Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting.