Robustness of Speech Separation Models for Similar-pitch Speakers

Lay, Bunlong; Zaczek, Sebastian; Tesch, Kristina; Gerkmann, Timo

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2407.15749 (eess)

[Submitted on 22 Jul 2024]

Title:Robustness of Speech Separation Models for Similar-pitch Speakers

Authors:Bunlong Lay, Sebastian Zaczek, Kristina Tesch, Timo Gerkmann

View PDF HTML (experimental)

Abstract:Single-channel speech separation is a crucial task for enhancing speech recognition systems in multi-speaker environments. This paper investigates the robustness of state-of-the-art Neural Network models in scenarios where the pitch differences between speakers are minimal. Building on earlier findings by Ditter and Gerkmann, which identified a significant performance drop for the 2018 Chimera++ under similar-pitch conditions, our study extends the analysis to more recent and sophisticated Neural Network models. Our experiments reveal that modern models have substantially reduced the performance gap for matched training and testing conditions. However, a substantial performance gap persists under mismatched conditions, with models performing well for large pitch differences but showing worse performance if the speakers' pitches are similar. These findings motivate further research into the generalizability of speech separation models to similar-pitch speakers and unseen data.

Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Signal Processing (eess.SP)
Cite as:	arXiv:2407.15749 [eess.AS]
	(or arXiv:2407.15749v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2407.15749

Submission history

From: Bunlong Lay [view email]
[v1] Mon, 22 Jul 2024 15:55:08 UTC (129 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Robustness of Speech Separation Models for Similar-pitch Speakers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Robustness of Speech Separation Models for Similar-pitch Speakers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators