Google Scholar

Weighting schemes for audio-visual fusion in speech recognition

H Glotin, D Vergyr, C Neti… - … on Acoustics, Speech …, 2001 - ieeexplore.ieee.org

H Glotin, D Vergyr, C Neti, G Potamianos, J Luettin

2001 IEEE International Conference on Acoustics, Speech, and …, 2001•ieeexplore.ieee.org

We demonstrate an improvement in the state-of-the-art large vocabulary continuous speech recognition (LVCSR) performance, under clean and noisy conditions, by the use of visual information, in addition to the traditional audio one. We take a decision fusion approach for the audio-visual information, where the single-modality (audio- and visual- only) HMM classifiers are combined to recognize audio-visual speech. More specifically, we tackle the problem of estimating the appropriate combination weights for each of the modalities. Two different techniques are described: the first uses an automatically extracted estimate of the audio stream reliability in order to modify the weights for each modality (both clean and noisy audio results are reported), while the second is a discriminative model combination approach where weights on pre-defined model classes are optimized to minimize WER (clean audio only results).

ieeexplore.ieee.org

Show moreShow less

Save Cite Cited by 131 Related articles All 23 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Weighting schemes for audio-visual fusion in speech recognition