Google Scholar

[PDF][PDF] The VIBVG Speech Synthesis System for Blizzard Challenge 2023

Y Lu, R Fu, X Qi, Z Wen, J Tao, J Yi, T Wang… - Proc. 18th Blizzard … - isca-archive.org

Y Lu, R Fu, X Qi, Z Wen, J Tao, J Yi, T Wang, Y Ren, C Zhang, C Yang, W Shi

Proc. 18th Blizzard Challenge Workshop•isca-archive.org

Abstract

The paper describes the VIBVG end-to-end neural text to speech (TTS) synthesis system entry for Blizzard Challenge 2023. One objective of the challenge is to synthesize natural and high-quality audio. Another objective is to generate audio that closely resembles the speech of the target person. Our speech synthesis system is built based on VITS, which is a multispeaker end-to-end speech synthesis system. Diverging from VITS, we have incorporated BigVGAN as the decoder instead of HiFi-GAN to enhance the quality of synthesized speech. Furthermore, to improve the naturalness of speech synthesis, we conducted a comparative analysis of various French graphemeto-phoneme (g2p) methods and employed certain modifications to the generated French phonemes. In this paper, the whole system structure, data pruning method will be presented and discussed. In addition, we will introduce the important parts of each task respectively. Finally, the results of listening test are presented and we will conduct some analysis on the results.

isca-archive.org

Show moreShow less

Save Cite Cited by 1 Related articles All 4 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

[PDF][PDF] The VIBVG Speech Synthesis System for Blizzard Challenge 2023