Exploring tradeoffs in models for low-latency speech enhancement

K Wilson, M Chinen, J Thorpe, B Patton… - … on Acoustic Signal …, 2018 - ieeexplore.ieee.org
We explore a variety of neural networks configurations for one-and two-channel
spectrogram-mask-based speech enhancement. Our best model improves on previous state-
of-the-art performance on the CHiME2 speech enhancement task by 0.4 decibels in signal-
to-distortion ratio (SDR). We examine trade-offs such as non-causal look-ahead,
computation, and parameter count versus enhancement performance and find that zero-look-
ahead models can achieve, on average, within 0.03 dB SDR of our best bidirectional model …

[CITATION][C] Exploring tradeoffs in models for low-latency speech enhancement

B Patton, J Skoglund, J Thorpe, J Hershey… - Proceedings of the …, 2018 - research.google
We examine trade-offs among non-causal lookahead, compute work, and parameter count
versus enhancement performance and find that zero-lookahead models can achieve, on
average, only 0.5 dB worse performance than our best bidirectional model. Further, we find
that 200 milliseconds of lookahead is sufficient to achieve performance within about 0.2 dB
from our best bidirectional model.
Showing the best results for this search. See all results