Improving Design of Input Condition Invariant Speech Enhancement

Zhang, Wangyou; Jung, Jee-weon; Watanabe, Shinji; Qian, Yanmin

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2401.14271 (eess)

[Submitted on 25 Jan 2024 (v1), last revised 16 Feb 2024 (this version, v2)]

Title:Improving Design of Input Condition Invariant Speech Enhancement

Authors:Wangyou Zhang, Jee-weon Jung, Shinji Watanabe, Yanmin Qian

View PDF HTML (experimental)

Abstract:Building a single universal speech enhancement (SE) system that can handle arbitrary input is a demanded but underexplored research topic. Towards this ultimate goal, one direction is to build a single model that handles diverse audio duration, sampling frequencies, and microphone variations in noisy and reverberant scenarios, which we define here as "input condition invariant SE". Such a model was recently proposed showing promising performance; however, its multi-channel performance degraded severely in real conditions. In this paper we propose novel architectures to improve the input condition invariant SE model so that performance in simulated conditions remains competitive while real condition degradation is much mitigated. For this purpose, we redesign the key components that comprise such a system. First, we identify that the channel-modeling module's generalization to unseen scenarios can be sub-optimal and redesign this module. We further introduce a two-stage training strategy to enhance training efficiency. Second, we propose two novel dual-path time-frequency blocks, demonstrating superior performance with fewer parameters and computational costs compared to the existing method. All proposals combined, experiments on various public datasets validate the efficacy of the proposed model, with significantly improved performance on real conditions. Recipe with full model details is released at this https URL.

Comments:	Accepted by ICASSP 2024, 5 pages, 2 figures, 3 tables (corrected the results of no processing on CHiME-4 (Simu) in Table 2)
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2401.14271 [eess.AS]
	(or arXiv:2401.14271v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2401.14271

Submission history

From: Wangyou Zhang [view email]
[v1] Thu, 25 Jan 2024 16:07:17 UTC (948 KB)
[v2] Fri, 16 Feb 2024 04:50:22 UTC (948 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Improving Design of Input Condition Invariant Speech Enhancement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Improving Design of Input Condition Invariant Speech Enhancement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators