Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

Boeddeker, Christoph; Zhang, Wangyou; Nakatani, Tomohiro; Kinoshita, Keisuke; Ochiai, Tsubasa; Delcroix, Marc; Kamo, Naoyuki; Qian, Yanmin; Haeb-Umbach, Reinhold

Computer Science > Sound

arXiv:2011.15003 (cs)

[Submitted on 30 Nov 2020 (v1), last revised 8 Jun 2021 (this version, v4)]

Title:Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

Authors:Christoph Boeddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach

View PDF

Abstract:Time-domain training criteria have proven to be very effective for the separation of single-channel non-reverberant speech mixtures. Likewise, mask-based beamforming has shown impressive performance in multi-channel reverberant speech enhancement and source separation. Here, we propose to combine neural network supported multi-channel source separation with a time-domain training objective function. For the objective we propose to use a convolutive transfer function invariant Signal-to-Distortion Ratio (CI-SDR) based loss. While this is a well-known evaluation metric (BSS Eval), it has not been used as a training objective before. To show the effectiveness, we demonstrate the performance on LibriSpeech based reverberant mixtures. On this task, the proposed system approaches the error rate obtained on single-source non-reverberant input, i.e., LibriSpeech test_clean, with a difference of only 1.2 percentage points, thus outperforming a conventional permutation invariant training based system and alternative objectives like Scale Invariant Signal-to-Distortion Ratio by a large margin.

Comments:	Accepted by ICASSP 2021
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2011.15003 [cs.SD]
	(or arXiv:2011.15003v4 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2011.15003

Submission history

From: Christoph Boeddeker [view email]
[v1] Mon, 30 Nov 2020 17:08:19 UTC (134 KB)
[v2] Tue, 1 Dec 2020 08:43:20 UTC (134 KB)
[v3] Mon, 7 Jun 2021 15:02:05 UTC (57 KB)
[v4] Tue, 8 Jun 2021 07:40:08 UTC (57 KB)

Computer Science > Sound

Title:Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators