Deep Learning Based Assessment of Synthetic Speech Naturalness

Mittag, Gabriel; Möller, Sebastian

doi:10.21437/Interspeech.2020-2382

Computer Science > Sound

arXiv:2104.11673v1 (cs)

[Submitted on 23 Apr 2021]

Title:Deep Learning Based Assessment of Synthetic Speech Naturalness

Authors:Gabriel Mittag, Sebastian Möller

View PDF

Abstract:In this paper, we present a new objective prediction model for synthetic speech naturalness. It can be used to evaluate Text-To-Speech or Voice Conversion systems and works language independently. The model is trained end-to-end and based on a CNN-LSTM network that previously showed to give good results for speech quality estimation. We trained and tested the model on 16 different datasets, such as from the Blizzard Challenge and the Voice Conversion Challenge. Further, we show that the reliability of deep learning-based naturalness prediction can be improved by transfer learning from speech quality prediction models that are trained on objective POLQA scores. The proposed model is made publicly available and can, for example, be used to evaluate different TTS system configurations.

Comments:	Late upload, presented at Interspeech 2020
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2104.11673 [cs.SD]
	(or arXiv:2104.11673v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2104.11673
Related DOI:	https://doi.org/10.21437/Interspeech.2020-2382

Submission history

From: Gabriel Mittag [view email]
[v1] Fri, 23 Apr 2021 16:05:20 UTC (46 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.SD

< prev | next >

new | recent | 2021-04

Change to browse by:

cs
cs.AI
cs.CL
cs.LG
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Sebastian Möller

export BibTeX citation

Computer Science > Sound

Title:Deep Learning Based Assessment of Synthetic Speech Naturalness

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Deep Learning Based Assessment of Synthetic Speech Naturalness

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators