MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

Müller, Nicolas M.; Kawa, Piotr; Choong, Wei Herng; Casanova, Edresson; Gölge, Eren; Müller, Thorsten; Syga, Piotr; Sperl, Philip; Böttinger, Konstantin

Computer Science > Sound

arXiv:2401.09512 (cs)

[Submitted on 17 Jan 2024 (v1), last revised 2 Nov 2024 (this version, v5)]

Title:MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

Authors:Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl, Konstantin Böttinger

View PDF HTML (experimental)

Abstract:Text-to-Speech (TTS) technology offers notable benefits, such as providing a voice for individuals with speech impairments, but it also facilitates the creation of audio deepfakes and spoofing attacks. AI-based detection methods can help mitigate these risks; however, the performance of such models is inherently dependent on the quality and diversity of their training data. Presently, the available datasets are heavily skewed towards English and Chinese audio, which limits the global applicability of these anti-spoofing systems. To address this limitation, this paper presents the Multi-Language Audio Anti-Spoof Dataset (MLAAD), created using 82 TTS models, comprising 33 different architectures, to generate 378.0 hours of synthetic voice in 38 different languages. We train and evaluate three state-of-the-art deepfake detection models with MLAAD and observe that it demonstrates superior performance over comparable datasets like InTheWild and Fake- OrReal when used as a training resource. Moreover, compared to the renowned ASVspoof 2019 dataset, MLAAD proves to be a complementary resource. In tests across eight datasets, MLAAD and ASVspoof 2019 alternately outperformed each other, each excelling on four datasets. By publishing MLAAD and making a trained model accessible via an interactive webserver, we aim to democratize anti-spoofing technology, making it accessible beyond the realm of specialists, and contributing to global efforts against audio spoofing and deepfakes.

Comments:	IJCNN 2024
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2401.09512 [cs.SD]
	(or arXiv:2401.09512v5 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2401.09512

Submission history

From: Nicolas Michael Müller [view email]
[v1] Wed, 17 Jan 2024 15:09:02 UTC (1,721 KB)
[v2] Wed, 28 Feb 2024 19:07:09 UTC (1,953 KB)
[v3] Tue, 16 Apr 2024 11:25:18 UTC (2,185 KB)
[v4] Tue, 24 Sep 2024 07:44:30 UTC (2,666 KB)
[v5] Sat, 2 Nov 2024 11:56:17 UTC (2,514 KB)

Computer Science > Sound

Title:MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators