[PDF][PDF] The VIBVG Speech Synthesis System for Blizzard Challenge 2023

Y Lu, R Fu, X Qi, Z Wen, J Tao, J Yi, T Wang… - Proc. 18th Blizzard … - isca-archive.org
Y Lu, R Fu, X Qi, Z Wen, J Tao, J Yi, T Wang, Y Ren, C Zhang, C Yang, W Shi
Proc. 18th Blizzard Challenge Workshopisca-archive.org
The paper describes the VIBVG end-to-end neural text to speech (TTS) synthesis system
entry for Blizzard Challenge 2023. One objective of the challenge is to synthesize natural
and high-quality audio. Another objective is to generate audio that closely resembles the
speech of the target person. Our speech synthesis system is built based on VITS, which is a
multispeaker end-to-end speech synthesis system. Diverging from VITS, we have
incorporated BigVGAN as the decoder instead of HiFi-GAN to enhance the quality of …
Abstract
The paper describes the VIBVG end-to-end neural text to speech (TTS) synthesis system entry for Blizzard Challenge 2023. One objective of the challenge is to synthesize natural and high-quality audio. Another objective is to generate audio that closely resembles the speech of the target person. Our speech synthesis system is built based on VITS, which is a multispeaker end-to-end speech synthesis system. Diverging from VITS, we have incorporated BigVGAN as the decoder instead of HiFi-GAN to enhance the quality of synthesized speech. Furthermore, to improve the naturalness of speech synthesis, we conducted a comparative analysis of various French graphemeto-phoneme (g2p) methods and employed certain modifications to the generated French phonemes. In this paper, the whole system structure, data pruning method will be presented and discussed. In addition, we will introduce the important parts of each task respectively. Finally, the results of listening test are presented and we will conduct some analysis on the results.
isca-archive.org
Showing the best result for this search. See all results