Google Scholar

Distant, Multichannel Speech Recognition Using Microphone Array Coding and Cloud-Based Beamforming with a Self-Attention Channel Combinator

D Sharma, D Jones, S Kruchinin… - 2023 57th Asilomar …, 2023 - ieeexplore.ieee.org

D Sharma, D Jones, S Kruchinin, R Gong, PA Naylor

2023 57th Asilomar Conference on Signals, Systems, and Computers, 2023•ieeexplore.ieee.org

Distant Automatic Speech Recognition (ASR) holds the promise of more natural human-machine interface and using multiple microphones to acquire speech in such environments often leads to better accuracy of ASR. The benefits come from encoding spatial information which can be used to enhance the speech and estimate the direction of sound arrival. Current ASR systems are based on end-to-end models that require considerable computational resources and are typically deployed in the cloud, which requires the use of a CODEC to help reduce the transmission bandwidth. We present a multichannel speech coding scheme specifically adapted for microphone array signals and unlike typical speech codecs, this scheme preserves phase relationships of the signals so that the spatial information can be exploited in the cloud. We explore the use of a frequency domain relative transfer function estimator as part of the CODEC. We also explore the use of a modified discrete cosine transform based Self Attention Channel Combinator (SACC) front-end for ASR and show that the time domain signal post SACC processing leads to significant improvements in C50. Furthermore, we show that preprocessing of the array signals with a de-reverberation method leads to a lower WER and also more accurate DOA estimation.

ieeexplore.ieee.org

Show moreShow less

Save Cite Related articles

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Distant, Multichannel Speech Recognition Using Microphone Array Coding and Cloud-Based Beamforming with a Self-Attention Channel Combinator