[PDF][PDF] Robust speech enhancement techniques for ASR in non-stationary noise and dynamic environments.
G Liu, D Dimitriadis, E Bocchieri - INTERSPEECH, 2013 - academia.edu
G Liu, D Dimitriadis, E Bocchieri
INTERSPEECH, 2013•academia.eduIn the current ASR systems the presence of competing speakers greatly degrades the
recognition performance. This phenomenon is getting even more prominent in the case of
hands-free, far-field ASR systems like the “Smart-TV” systems, where reverberation and non-
stationary noise pose additional challenges. Furthermore, speakers are, most often, not
standing still while speaking. To address these issues, we propose a cascaded system that
includes Time Differences of Arrival estimation, multi-channel Wiener Filtering, nonnegative …
recognition performance. This phenomenon is getting even more prominent in the case of
hands-free, far-field ASR systems like the “Smart-TV” systems, where reverberation and non-
stationary noise pose additional challenges. Furthermore, speakers are, most often, not
standing still while speaking. To address these issues, we propose a cascaded system that
includes Time Differences of Arrival estimation, multi-channel Wiener Filtering, nonnegative …
Abstract
In the current ASR systems the presence of competing speakers greatly degrades the recognition performance. This phenomenon is getting even more prominent in the case of hands-free, far-field ASR systems like the “Smart-TV” systems, where reverberation and non-stationary noise pose additional challenges. Furthermore, speakers are, most often, not standing still while speaking. To address these issues, we propose a cascaded system that includes Time Differences of Arrival estimation, multi-channel Wiener Filtering, nonnegative matrix factorization (NMF), multi-condition training, and robust feature extraction, whereas each of them additively improves the overall performance. The final cascaded system presents an average of 50% and 45% relative improvement in ASR word accuracy for the CHiME 2011 (non-stationary noise) and CHiME 2012 (non-stationary noise plus speaker head movement) tasks, respectively.
academia.edu
Showing the best result for this search. See all results