Weighting schemes for audio-visual fusion in speech recognition
H Glotin, D Vergyr, C Neti… - … on Acoustics, Speech …, 2001 - ieeexplore.ieee.org
2001 IEEE International Conference on Acoustics, Speech, and …, 2001•ieeexplore.ieee.org
We demonstrate an improvement in the state-of-the-art large vocabulary continuous speech
recognition (LVCSR) performance, under clean and noisy conditions, by the use of visual
information, in addition to the traditional audio one. We take a decision fusion approach for
the audio-visual information, where the single-modality (audio-and visual-only) HMM
classifiers are combined to recognize audio-visual speech. More specifically, we tackle the
problem of estimating the appropriate combination weights for each of the modalities. Two …
recognition (LVCSR) performance, under clean and noisy conditions, by the use of visual
information, in addition to the traditional audio one. We take a decision fusion approach for
the audio-visual information, where the single-modality (audio-and visual-only) HMM
classifiers are combined to recognize audio-visual speech. More specifically, we tackle the
problem of estimating the appropriate combination weights for each of the modalities. Two …
We demonstrate an improvement in the state-of-the-art large vocabulary continuous speech recognition (LVCSR) performance, under clean and noisy conditions, by the use of visual information, in addition to the traditional audio one. We take a decision fusion approach for the audio-visual information, where the single-modality (audio- and visual- only) HMM classifiers are combined to recognize audio-visual speech. More specifically, we tackle the problem of estimating the appropriate combination weights for each of the modalities. Two different techniques are described: the first uses an automatically extracted estimate of the audio stream reliability in order to modify the weights for each modality (both clean and noisy audio results are reported), while the second is a discriminative model combination approach where weights on pre-defined model classes are optimized to minimize WER (clean audio only results).
ieeexplore.ieee.org
Showing the best result for this search. See all results