default search action
11th SSW 2021: Budapest, Hungary
- Géza Németh:
11th ISCA Speech Synthesis Workshop, SSW 2021, Budapest, Hungary, August 26-28, 2021. ISCA 2021
Session 1: Special synthesis problems
- Sai Sirisha Rallabandi, Babak Naderi, Sebastian Möller:
Identifying the vocal cues of likeability, friendliness and skilfulness in synthetic speech. 1-6 - Tamás Gábor Csapó:
Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging. 7-12 - Martin Lenglet, Olivier Perrotin, Gérard Bailly:
Impact of Segmentation and Annotation in French end-to-end Synthesis. 13-18 - Marc Illa, Bence Mark Halpern, Rob van Son, Laureano Moro-Velázquez, Odette Scharenborg:
Pathological voice adaptation with autoencoder-based voice conversion. 19-24 - Elijah Gutierrez, Pilar Oplustil Gallegos, Catherine Lai:
Location, Location: Enhancing the Evaluation of Text-to-Speech synthesis using the Rapid Prosody Transcription Paradigm. 25-30
Session 2: Articulation and speech styles
- Tamás Gábor Csapó, László Tóth, Gábor Gosztolya, Alexandra Markó:
Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input. 31-36 - Javier Latorre, Charlotte Bailleul, Tuuli Morrill, Alistair Conkie, Yannis Stylianou:
Combining speakers of multiple languages to improve quality of neural voices. 37-42 - Christina Tånnander, Jens Edlund:
Methods of slowing down speech. 43-47 - Joakim Gustafson, Jonas Beskow, Éva Székely:
Personality in the mix - investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesis. 48-53 - Csaba Zainkó, László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Alexandra Markó, Géza Németh, Tamás Gábor Csapó:
Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging. 54-59
Session 3: Expressive synthesis
- Bastian Schnell, Philip N. Garner:
Improving Emotional TTS with an Emotion Intensity Input from Unsupervised Extraction. 60-65 - Slava Shechtman, Avrech Ben-David:
Acquiring conversational speaking style from multi-speaker spontaneous dialog corpus for prosody-controllable sequence-to-sequence speech synthesis. 66-71 - Bastian Schnell, Goeric Huybrechts, Bartek Perz, Thomas Drugman, Jaime Lorenzo-Trueba:
EmoCat: Language-agnostic Emotional Voice Conversion. 72-77 - Abdelhamid Ezzerg, Adam Gabrys, Bartosz Putrycz, Daniel Korzekwa, Daniel Saez-Trigueros, David McHardy, Kamil Pokora, Jakub Lachowicz, Jaime Lorenzo-Trueba, Viacheslav Klimkov:
Enhancing audio quality for expressive Neural Text-to-Speech. 78-83 - Lucas H. Ueda, Paula D. P. Costa, Flávio Olmos Simões, Mário Uliani Neto:
Are we truly modeling expressiveness? A study on expressive TTS in Brazilian Portuguese for real-life application styles. 84-89
Session 4: Articulation and Naturalness
- Debasish Ray Mohapatra, Pramit Saha, Yadong Liu, Bryan Gick, Sidney Fels:
Vocal tract area function extraction using ultrasound for articulatory speech synthesis. 90-95 - Raahil Shah, Kamil Pokora, Abdelhamid Ezzerg, Viacheslav Klimkov, Goeric Huybrechts, Bartosz Putrycz, Daniel Korzekwa, Thomas Merritt:
Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech. 96-101 - Paul Konstantin Krug, Simon Stone, Peter Birkholz:
Intelligibility and naturalness of articulatory synthesis with VocalTractLab compared to established speech synthesis technologies. 102-107 - Ambika Kirkland, Marcin Wlodarczak, Joakim Gustafson, Éva Székely:
Perception of smiling voice in spontaneous speech synthesis. 108-112 - Alejandro Mottini, Jaime Lorenzo-Trueba, Sri Vishnu Kumar Karlapati, Thomas Drugman:
Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments. 113-117
Session 5: Emotion, singing and voice conversion
- Konstantinos Markopoulos, Nikolaos Ellinas, Alexandra Vioni, Myrsini Christidou, Panos Kakoulidis, Georgios Vamvoukakis, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis, Aimilios Chalamandaris, Georgia Maniati:
Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control. 118-123 - Jennifer Williams, Jason Fong, Erica Cooper, Junichi Yamagishi:
Exploring Disentanglement with Multilingual and Monolingual VQ-VAE. 124-129 - Erica Cooper, Xin Wang, Junichi Yamagishi:
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis. 130-135 - Hieu-Thi Luong, Junichi Yamagishi:
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance. 136-141 - Patrick Lumban Tobing, Tomoki Toda:
Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction. 142-147
Session 6: Multilingual and evaluation
- Johannah O'Mahony, Pilar Oplustil Gallegos, Catherine Lai, Simon King:
Factors Affecting the Evaluation of Synthetic Speech in Context. 148-153 - Arun Baby, Pranav Jawale, Saranya Vinnaitherthan, Sumukh Badam, Nagaraj Adiga, Sharath Adavanne:
Non-native English lexicon creation for bilingual speech synthesis. 154-159 - Dan Wells, Korin Richmond:
Cross-lingual Transfer of Phonological Features for Low-resource Speech Synthesis. 160-165 - Ayushi Pandey, Sébastien Le Maguer, Julie Carson-Berndsen, Naomi Harte:
Mind your p's and k's - Comparing obstruents across TTS voices of the Blizzard Challenge 2013. 166-171 - Jason Fong, Jilong Wu, Prabhav Agrawal, Andrew Gibiansky, Thilo Köhler, Qing He:
Improving Polyglot Speech Synthesis through Multi-task and Adversarial Learning. 172-176
Session 7: Modeling and evaluation
- Ammar Abbas, Bajibabu Bollepalli, Alexis Moinet, Arnaud Joly, Penny Karanasou, Peter Makarov, Simon Slangen, Sri Karlapati, Thomas Drugman:
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech. 177-182 - Erica Cooper, Junichi Yamagishi:
How do Voices from Past Speech Synthesis Challenges Compare Today? 183-188 - Kazuya Yufune, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari:
Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder. 189-194 - Jason Taylor, Sébastien Le Maguer, Korin Richmond:
Liaison and Pronunciation Learning in End-to-End Text-to-Speech in French. 195-199 - Qiao Tian, Chao Liu, Zewang Zhang, Heng Lu, Linghui Chen, Bin Wei, Pujiang He, Shan Liu:
FeatherTTS: Robust and Efficient attention based Neural TTS. 200-204
Session 8: Synthesis and Context
- Pilar Oplustil Gallegos, Johannah O'Mahony, Simon King:
Comparing acoustic and textual representations of previous linguistic context for improving Text-to-Speech. 205-210 - Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Naoko Tanji, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari:
Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings. 211-215 - Mano Ranjith Kumar M., Jom Kuriakose, Karthik Pandia D. S, Hema A. Murthy:
Lipsyncing efforts for transcreating lecture videos in Indian languages. 216-221 - Marco Nicolis, Viacheslav Klimkov:
Homograph disambiguation with contextual word embeddings for TTS systems. 222-226 - Jason Fong, Jennifer Williams, Simon King:
Analysing Temporal Sensitivity of VQ-VAE Sub-Phone Codebooks. 227-231
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.