[PDF][PDF] Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting.
Z Meng, BH Juang - INTERSPEECH, 2016 - isca-archive.org
Z Meng, BH Juang
INTERSPEECH, 2016•isca-archive.orgLarge vocabulary continuous speech recognition (LVCSR) has achieved extraordinary
performance when the speech is read or dictated. For instance, a word accuracy higher than
90% can be expected on the Wall Street Journal task. However, this performance decreases
tremendously on a spontaneous conversational speech recognition task [2] as it consists of
a stream of words with no overt lexical marking of punctuations and disfluencies (ie, filled
pauses, repetitions, repairs and false starts) may occur frequently in a natural conversation …
performance when the speech is read or dictated. For instance, a word accuracy higher than
90% can be expected on the Wall Street Journal task. However, this performance decreases
tremendously on a spontaneous conversational speech recognition task [2] as it consists of
a stream of words with no overt lexical marking of punctuations and disfluencies (ie, filled
pauses, repetitions, repairs and false starts) may occur frequently in a natural conversation …
Large vocabulary continuous speech recognition (LVCSR) has achieved extraordinary performance when the speech is read or dictated. For instance, a word accuracy higher than 90% can be expected on the Wall Street Journal task. However, this performance decreases tremendously on a spontaneous conversational speech recognition task [2] as it consists of a stream of words with no overt lexical marking of punctuations and disfluencies (ie, filled pauses, repetitions, repairs and false starts) may occur frequently in a natural conversation [3]. However, in real applications, it is more important to semantically understand a spontaneous speech rather than to recognize its word transcription. Moreover, the semantic meaning generally resides in a set of keywords in the spoken utterances. Therefore, keyword spotting techniques become crucial for spontaneous conversational speech recognition tasks.
isca-archive.org
Showing the best result for this search. See all results