default search action
17th Interspeech 2016: San Francisco, CA, USA
- Nelson Morgan:
17th Annual Conference of the International Speech Communication Association, Interspeech 2016, San Francisco, CA, USA, September 8-12, 2016. ISCA 2016
Keynote 1: ISCA Medalist: John Makhoul
- John Makhoul:
A 50-Year Retrospective on Speech and Language Processing. 1
Neural Networks in Speech Recognition
- Ivan Medennikov, Alexey Prudnikov, Alexander Zatvornitskiy:
Improving English Conversational Telephone Speech Recognition. 2-6 - George Saon, Tom Sercu, Steven J. Rennie, Hong-Kwang Jeff Kuo:
The IBM 2016 English Conversational Telephone Speech Recognition System. 7-11 - Liang Lu, Steve Renals:
Small-Footprint Deep Neural Networks with Highway Connections for Speech Recognition. 12-16 - Dong Yu, Wayne Xiong, Jasha Droppo, Andreas Stolcke, Guoli Ye, Jinyu Li, Geoffrey Zweig:
Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention. 17-21 - Golan Pundak, Tara N. Sainath:
Lower Frame Rate Neural Network Acoustic Models. 22-26 - Gakuto Kurata, Brian Kingsbury:
Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling. 27-31
Special Session: Auditory-Visual Expressive Speech and Gesture in Humans and Machines
- Lei Chen, Gary Feng, Michelle P. Martin-Raugh, Chee Wee Leong, Christopher Kitchen, Su-Youn Yoon, Blair Lehman, Harrison Kell, Chong Min Lee:
Automatic Scoring of Monologue Video Interviews Using Multimodal Cues. 32-36 - Chee Seng Chong, Jeesun Kim, Chris Davis:
The Sound of Disgust: How Facial Expression May Influence Speech Production. 37-41 - Zhaojun Yang, Shrikanth S. Narayanan:
Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions. 42-46 - Attigodu C. Ganesh, Frédéric Berthommier, Jean-Luc Schwartz:
Audiovisual Speech Scene Analysis in the Context of Competing Sources. 47-51 - Najmeh Sadoughi, Carlos Busso:
Head Motion Generation with Synthetic Speech: A Data Driven Approach. 52-56 - Jeesun Kim, Chris Davis:
The Consistency and Stability of Acoustic and Visual Cues for Different Prosodic Attitudes. 57-61 - Jeesun Kim, Gérard Bailly:
Introduction to Poster Presentation of Part II.
Prosody
- Irene Vogel, Laura Spinu:
The Unit of Speech Encoding: The Case of Romanian. 62-66 - Jeanin Jügler, Frank Zimmerer, Jürgen Trouvain, Bernd Möbius:
The Perceptual Effect of L1 Prosody Transplantation on L2 Speech: The Case of French Accented German. 67-71 - Bijun Ling, Jie Liang:
Organizing Syllables into Sandhi Domains - Evidence from F0 and Duration Patterns in Shanghai Chinese. 72-76 - Neville Ryant, Mark Liberman:
Automatic Analysis of Phonetic Speech Style Dimensions. 77-81 - Angeliki Athanasopoulou, Irene Vogel:
The Acoustic Manifestation of Prominence in Stressless Languages. 82-86 - Wei Lai, Jiahong Yuan, Ya Li, Xiaoying Xu, Mark Liberman:
The Rhythmic Constraint on Prosodic Boundaries in Mandarin Chinese Based on Corpora of Silent Reading and Speech Perception. 87-91
Speech and Language Processing for Clinical Health Applications
- Fu-Sheng Tsai, Ya-Ling Hsu, Wei-Chen Chen, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee:
Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial Expressions. 92-96 - Tan Lee, Yuanyuan Liu, Yu Ting Yeung, Thomas K. T. Law, Kathy Y. S. Lee:
Predicting Severity of Voice Disorder from DNN-HMM Acoustic Posteriors. 97-101 - Klaske E. van Sluis, Michiel W. M. van den Brekel, Frans J. M. Hilgers, Rob J. J. H. van Son:
Long-Term Stability of Tracheoesophageal Voices. 102-106 - Gábor Gosztolya, László Tóth, Tamás Grósz, Veronika Vincze, Ildikó Hoffmann, Gréta Szatlóczki, Magdolna Pákáski, János Kálmán:
Detecting Mild Cognitive Impairment from Spontaneous Speech by Correlation-Based Phonetic Feature Selection. 107-111 - Jen J. Gong, Maryann Gong, Dina Levy-Lambert, Jordan R. Green, Tiffany P. Hogan, John V. Guttag:
Towards an Automated Screening Tool for Developmental Speech and Language Impairments. 112-116 - Vikram C. M., Nagaraj Adiga, S. R. Mahadeva Prasanna:
Spectral Enhancement of Cleft Lip and Palate Speech. 117-121
Speech Coding and Audio Processing for Noise Reduction
- Tian Guan, Guangxing Chu, Fei Chen, Feng Yang:
Assessing Level-Dependent Segmental Contribution to the Intelligibility of Speech Processed by Single-Channel Noise-Suppression Algorithms. 122-125 - Tudor-Catalin Zorila, Sheila Flanagan, Brian C. J. Moore, Yannis Stylianou:
Effectiveness of Near-End Speech Enhancement Under Equal-Loudness and Equal-Level Constraints. 126-130 - Bidisha Sharma, S. R. Mahadeva Prasanna:
Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence. 131-135 - Lei Wang, Shufeng Zhu, Diliang Chen, Yong Feng, Fei Chen:
Relative Contributions of Amplitude and Phase to the Intelligibility Advantage of Ideal Binary Masked Sentences. 136-139 - Qingju Liu, Yan Tang, Philip J. B. Jackson, Wenwu Wang:
Predicting Binaural Speech Intelligibility from Signals Estimated by a Blind Source Separation Algorithm. 140-144 - Petko Nikolov Petkov, Norbert Braunschweiler, Yannis Stylianou:
Automated Pause Insertion for Improved Intelligibility Under Reverberation. 145-149
Speech Analysis
- Jean-Luc Rouas, Leonidas Ioannidis:
Automatic Classification of Phonation Modes in Singing Voice: Towards Singing Style Characterisation and Application to Ethnomusicological Recordings. 150-154 - Himanshu N. Bhavsar, Tanvina B. Patel, Hemant A. Patil:
Novel Nonlinear Prediction Based Features for Spoofed Speech Detection. 155-159 - Sri Harsha Dumpala, Bhanu Teja Nellore, Raghu Ram Nevali, Suryakanth V. Gangashetty, B. Yegnanarayana:
Robust Vowel Landmark Detection Using Epoch-Based Features. 160-164 - Johannes Töger, Yongwan Lim, Sajan Goud Lingala, Shrikanth S. Narayanan, Krishna S. Nayak:
Sensitivity of Quantitative RT-MRI Metrics of Vocal Tract Dynamics to Image Reconstruction Settings. 165-169 - Milos Cernak, Afsaneh Asaei, Pierre-Edouard Honnet, Philip N. Garner, Hervé Bourlard:
Sound Pattern Matching for Automatic Prosodic Event Detection. 170-174 - Mostafa Ali Shahin, Julien Epps, Beena Ahmed:
Automatic Classification of Lexical Stress in English and Arabic Languages Using Deep Learning. 175-179
First and Second Language Acquisition
- Fei Chen, Nan Yan, Xunan Huang, Hao Zhang, Lan Wang, Gang Peng:
Development of Mandarin Onset-Rime Detection in Relation to Age and Pinyin Instruction. 180-184 - Xinyi Wen, Yuan Jia:
Joint Effect of Dialect and Mandarin on English Vowel Production: A Case Study in Changsha EFL Learners. 185-189 - Tamami Katayama:
Effects of L1 Phonotactic Constraints on L2 Word Segmentation Strategies. 190-194 - Jane Wottawa, Martine Adda-Decker, Frédéric Isel:
Putting German [ʃ] and [ç] in Two Different Boxes: Native German vs L2 German of French Learners. 195-199 - Dean Luo, Ruxin Luo, Lixin Wang:
Naturalness Judgement of L2 English Through Dubbing Practice. 200-203 - Yasuaki Shinohara:
Audiovisual Training Effects for Japanese Children Learning English /r/-/l/. 204-207 - Sarah Harper, Louis Goldstein, Shrikanth S. Narayanan:
L2 Acquisition and Production of the English Rhotic Pharyngeal Gesture. 208-212
Speech and Hearing Disorders & Perception
- Alexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen:
Auditory-Visual Perception of VCVs Produced by People with Down Syndrome: Preliminary Results. 213-217 - Emre Yilmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik:
Combining Non-Pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological Speech. 218-222 - Imed Laaridh, Corinne Fredouille, Christine Meunier:
Evaluation of a Phone-Based Anomaly Detection Approach for Dysarthric Speech. 223-227 - Chitralekha Bhat, Bhavik Vachhani, Sunil Kumar Kopparapu:
Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper Spectral Estimation. 228-232 - Fei Chen, Nan Yan, Xiaojie Pan, Feng Yang, Zhuanzhuan Ji, Lan Wang, Gang Peng:
Impaired Categorical Perception of Mandarin Tones and its Relationship to Language Ability in Autism Spectrum Disorders. 233-237 - Kathleen F. Nagle, James T. Heaton:
Perceived Naturalness of Electrolaryngeal Speech Produced Using sEMG-Controlled vs. Manual Pitch Modulation. 238-242 - Shamima Najnin, Bonny Banerjee, Lisa Lucks Mendel, Masoumeh Heidari Kapourchali, Jayanta Kumar Dutta, Sungmin Lee, Chhayakanta Patro, Monique Pousson:
Identifying Hearing Loss from Learned Speech Kernels. 243-247 - Panying Rong, Yana Yunusova, Jordan R. Green:
Differential Effects of Velopharyngeal Dysfunction on Speech Intelligibility During Early and Late Stages of Amyotrophic Lateral Sclerosis. 248-252 - Véronique Delvaux, Virginie Roland, Kathy Huet, Myriam Piccaluga, Marie-Claire Haelewyck, Bernard Harmegnies:
The Production of Intervocalic Glides in Non Dysarthric Parkinsonian Speech. 253-256 - Yang Feng, Zhang Lu:
Auditory Processing Impairments Under Background Noise in Children with Non-Syndromic Cleft Lip and/or Palate. 257-261 - Zhi Zhu, Ryota Miyauchi, Yukiko Araki, Masashi Unoki:
Modulation Spectral Features for Predicting Vocal Emotion Recognition by Simulated Cochlear Implants. 262-266 - Keiko Ochi, Koichi Mori, Naomi Sakai, Nobutaka Ono:
Automatic Discrimination of Soft Voice Onset Using Acoustic Features of Breathy Voicing. 267-271 - Jing Shao, Caicai Zhang, Gang Peng, Yike Yang, William S.-Y. Wang:
Effect of Noise on Lexical Tone Perception in Cantonese-Speaking Amusics. 272-276 - Yuki Takashima, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki, Nobuyuki Mitani, Kiyohiro Omori, Kaoru Nakazono:
Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss. 277-281 - Yuling Gu, Boon Pang Lim, Nancy F. Chen:
Perception of Tone in Whispered Mandarin Sentences: The Case for Singapore Mandarin. 282-286
Speech Synthesis Poster
- Feng-Long Xie, Frank K. Soong, Haifeng Li:
A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences. 287-291 - Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki:
Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-Embedded Non-Negative Matrix Factorization. 292-296 - Yu Gu, Zhen-Hua Ling, Li-Rong Dai:
Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks. 297-301 - Yi Yang, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu:
Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame Features. 302-306 - Naoki Hosaka, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
Voice Conversion Based on Trajectory Model Training of Neural Networks Considering Global Variance. 307-311 - Sandesh Aryal, Ricardo Gutierrez-Osuna:
Comparing Articulatory and Acoustic Strategies for Reducing Non-Native Accents. 312-316 - Seyyed Saeed Sarfjoo, Cenk Demiroglu:
Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data. 317-321 - Lifa Sun, Hao Wang, Shiyin Kang, Kun Li, Helen M. Meng:
Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams. 322-326 - Anusha Prakash, Jeena J. Prakash, Hema A. Murthy:
Acoustic Analysis of Syllables Across Indian Languages. 327-331 - Teng Zhang, Zhipeng Chen, Ji Wu, Sam Lai, Wenhui Lei, Carsten Isert:
Objective Evaluation Methods for Chinese Text-To-Speech Systems. 332-336 - Yusuke Ijima, Taichi Asami, Hideyuki Mizuno:
Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis. 337-341 - Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, Keiichi Tokuda:
A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks. 342-346 - Monika Podsiadlo, Shweta Chahar:
Text-to-Speech for Individuals with Vision Loss: A User Study. 347-351 - Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki, Junichi Yamagishi:
Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks. 352-356 - Erica Cooper, Alison Chang, Yocheved Levitan, Julia Hirschberg:
Data Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis. 357-361
Topics in Speech Processing
- Fei Tao, Louis Daudet, Christian Poellabauer, Sandra L. Schneider, Carlos Busso:
A Portable Automatic PA-TA-KA Syllable Detection System to Derive Biomarkers for Neurological Disorders. 362-366 - Omid Ghahabi, Antonio Bonafonte, Javier Hernando, Asunción Moreno:
Deep Neural Networks for i-Vector Language Identification of Short Utterances in Cars. 367-371 - Abraham Woubie, Jordi Luque, Javier Hernando:
Improving i-Vector and PLDA Based Speaker Clustering with Long-Term Features. 372-376
Show & Tell Session 1
- Aaron Lawson, Mitchell McLaren, Harry Bratt, Martin Graciarena, Horacio Franco, Christopher George, Allen R. Stauffer, Chris Bartels, Julien van Hout:
Open Language Interface for Voice Exploitation (OLIVE). 377-378 - Lubos Smídl, Adam Chýlek, Jan Svec:
A Multimodal Dialogue System for Air Traffic Control Trainees Based on Discrete-Event Simulation. 379-380 - Elodie Gauthier, David Blachon, Laurent Besacier, Guy-Noël Kouarata, Martine Adda-Decker, Annie Rialland, Gilles Adda, Grégoire Bachman:
Lig-Aikuma: A Mobile App to Collect Parallel Speech for Under-Resourced Language Studies. 381-382 - Martin Gruber, Jindrich Matousek, Zdenek Hanzlícek, Zdenek Krnoul, Zbynek Zajíc:
ARET - Automatic Reading of Educational Texts for Visually Impaired Students. 383-384
New Trends in Neural Networks for Speech Recognition
- Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals:
Segmental Recurrent Neural Networks for End-to-End Speech Recognition. 385-389 - Markus Nußbaum-Thom, Jia Cui, Bhuvana Ramabhadran, Vaibhava Goel:
Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional Units. 390-394 - Wei-Ning Hsu, Yu Zhang, Ann Lee, James R. Glass:
Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition. 395-399 - Chunyang Wu, Penny Karanasou, Mark J. F. Gales, Khe Chai Sim:
Stimulated Deep Neural Network for Speech Recognition. 400-404 - Leonardo Badino:
Phonetic Context Embeddings for DNN-HMM Phone Recognition. 405-409 - Ying Zhang, Mohammad Pezeshki, Philémon Brakel, Saizheng Zhang, César Laurent, Yoshua Bengio, Aaron C. Courville:
Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks. 410-414
Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances
- Guangsen Wang, Kong-Aik Lee, Trung Hieu Nguyen, Hanwu Sun, Bin Ma:
Joint Speaker and Lexical Modeling for Short-Term Characterization of Speaker. 415-419 - Md. Jahangir Alam, Patrick Kenny, Vishwa Gupta:
Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus. 420-424 - Achintya Kumar Sarkar, Zheng-Hua Tan:
Text Dependent Speaker Verification Using Un-Supervised HMM-UBM and Temporal GMM-UBM. 425-429 - Tomi Kinnunen, Md. Sahidullah, Ivan Kukanov, Héctor Delgado, Massimiliano Todisco, Achintya Kumar Sarkar, Nicolai Bæk Thomsen, Ville Hautamäki, Nicholas W. D. Evans, Zheng-Hua Tan:
Utterance Verification for Text-Dependent Speaker Recognition: A Comparative Assessment Using the RedDots Corpus. 430-434 - Jianbo Ma, Saad Irtza, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah:
Parallel Speaker and Content Modelling for Text-Dependent Speaker Verification. 435-439 - Hossein Zeinali, Hossein Sameti, Lukás Burget, Jan Cernocký, Nooshin Maghsoodi, Pavel Matejka:
i-Vector/HMM Based Text-Dependent Speaker Verification System for RedDots Challenge. 440-444 - Rohan Kumar Das, Sarfaraz Jelil, S. R. Mahadeva Prasanna:
Exploring Session Variability and Template Aging in Speaker Verification for Fixed Phrase Short Utterances. 445-449
Articulatory Measurements and Analysis
- Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu:
Prediction of the Articulatory Movements of Unseen Phonemes of a Speaker Using the Speech Structure of Another Speaker. 450-454 - Ganesh Sivaraman, Vikramjit Mitra, Hosung Nam, Mark K. Tiede, Carol Y. Espy-Wilson:
Vocal Tract Length Normalization for Speaker Independent Acoustic-to-Articulatory Speech Inversion. 455-459 - Adam C. Lammert, Christine H. Shadle, Shrikanth S. Narayanan, Thomas F. Quatieri:
Investigation of Speed-Accuracy Tradeoffs in Speech Production Using Real-Time Magnetic Resonance Imaging. 460-464 - Tanner Sorensen, Asterios Toutios, Louis Goldstein, Shrikanth S. Narayanan:
Characterizing Vocal Tract Dynamics Across Speakers Using Real-Time MRI. 465-469 - Mathieu Labrunie, Pierre Badin, Dirk Voit, Arun A. Joseph, Laurent Lamalle, Coriandre Vilain, Louis-Jean Boë, Jens Frahm:
Tracking Contours of Orofacial Articulators from Real-Time MRI of Speech. 470-474 - Sajan Goud Lingala, Asterios Toutios, Johannes Töger, Yongwan Lim, Yinghua Zhu, Yoon-Chul Kim, Colin Vaz, Shrikanth S. Narayanan, Krishna S. Nayak:
State-of-the-Art MRI Protocol for Comprehensive Assessment of Vocal Tract Structure and Function. 475-479
Automatic Assessment of Emotions
- Rui Xia, Yang Liu:
DBN-ivector Framework for Acoustic Emotion Recognition. 480-484 - Brian Stasak, Julien Epps, Nicholas Cummins, Roland Goecke:
An Investigation of Emotional Speech in Depression Classification. 485-489 - Reza Lotfian, Carlos Busso:
Retrieving Categorical Emotions Using a Probabilistic Framework to Define Preference Learning Samples. 490-494 - Maximilian Schmitt, Fabien Ringeval, Björn W. Schuller:
At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech. 495-499 - Arodami Chorianopoulou, Polychronis Koutsakis, Alexandros Potamianos:
Speech Emotion Recognition Using Affective Saliency. 500-504 - Rahul Gupta, Nishant Nath, Taruna Agrawal, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan:
Laughter Valence Prediction in Motivational Interviewing Based on Lexical and Acoustic Cues. 505-509
Acoustic and Articulatory Phonetics
- Marcin Wlodarczak, Mattias Heldner:
Respiratory Belts and Whistles: A Preliminary Study of Breathing Acoustics for Turn-Taking. 510-514 - Constantijn Kaland, Vincenzo Galatà, Lorenzo Spreafico, Alessandro Vietti:
/r/ as Language Marker in Bilingual Speech Production and Perception. 515-519 - Manfred Pützer, Frank Zimmerer, Wolfgang Wokurek, Jeanin Jügler:
Evaluation of Phonatory Behavior of German and French Speakers in Native and Non-Native Speech. 520-524 - Sofia Strömbergsson:
Today's Most Frequently Used F0 Estimation Methods, and Their Accuracy in Estimating Male and Female Pitch in Clean Speech. 525-529 - Lei He, Volker Dellwo:
A Praat-Based Algorithm to Extract the Amplitude Envelope and Temporal Fine Structure Using the Hilbert Transform. 530-534 - Ewald Enzinger:
Likelihood Ratio Calculation in Acoustic-Phonetic Forensic Voice Comparison: Comparison of Three Statistical Modelling Approaches. 535-539
Source Separation and Spatial Audio
- Xiaoke Qi, Jianhua Tao:
A Sparse Spherical Harmonic-Based Model in Subbands for Head-Related Transfer Functions. 540-544 - Yusuf Ziya Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey:
Single-Channel Multi-Speaker Separation Using Deep Clustering. 545-549 - Hao Li, Shuai Nie, Xueliang Zhang, Hui Zhang:
Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation. 550-554 - Masood Delfarah, DeLiang Wang:
A Feature Study for Masking-Based Reverberant Speech Separation. 555-559 - Chung-Chien Hsu, Tai-Shih Chi, Jen-Tzung Chien:
Discriminative Layered Nonnegative Matrix Factorization for Speech Separation. 560-564 - Arpita Gang, Pravesh Biyani:
On Discriminative Framework for Single Channel Audio Source Separation. 565-569
Special Session: Auditory-Visual Expressive Speech and Gesture in Humans and Machines
- Qin Jin, Junwei Liang, Xiaozhu Lin:
Generating Natural Video Descriptions via Multimodal Processing. 570-574 - Martin Heckmann:
Feature-Level Decision Fusion for Audio-Visual Word Prominence Detection. 575-579 - Slim Ouni, Vincent Colotte, Sara Dahmani, Soumaya Azzi:
Acoustic and Visual Analysis of Expressive Speech: A Case Study of French Acted Speech. 580-584 - Adela Barbulescu, Rémi Ronfard, Gérard Bailly:
Characterization of Audiovisual Dramatic Attitudes. 585-589 - Yuyun Huang, Emer Gilmartin, Nick Campbell:
Conversational Engagement Recognition Using Auditory and Visual Cues. 590-594 - Theodora Chaspari, Jill Fain Lehman:
An Acoustic Analysis of Child-Child and Child-Robot Interactions for Understanding Engagement during Speech-Controlled Computer Games. 595-599 - Benjawan Kasisopa, Chutamanee Onsuwan, Charturong Tantibundhit, Nittayapa Klangpornkun, Suparak Techacharoenrungrueang, Sudaporn Luksaneeyanawin, Denis Burnham:
Auditory-Visual Lexical Tone Perception in Thai Elderly Listeners with and without Hearing Impairment. 600-604 - Hossein Khaki, Engin Erzin:
Use of Agreement/Disagreement Classification in Dyadic Interactions for Continuous Emotion Recognition. 605-609
Special Session: Intelligibility Under the Microscope
- Marc René Schädler, David Hülsmeier, Anna Warzybok, Sabine Hochmuth, Birger Kollmeier:
Microscopic Multilingual Matrix Test Predictions Using an ASR-Based Speech Recognition Model. 610-614 - Mats Exter, Bernd T. Meyer:
DNN-Based Automatic Speech Recognition as a Model for Human Phoneme Perception. 615-619 - Máté Attila Tóth, Martin Cooke:
Undoing Misperceptions: A Microscopic Analysis of Consistent Confusions Through Signal Modifications. 620-624 - Mahdie Karbasi, Ahmed Hussen Abdelaziz, Hendrik Meutzner, Dorothea Kolossa:
Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs. 625-629 - Máté Attila Tóth, Martin Cooke, Jon Barker:
Misperceptions Arising from Speech-in-Babble Interactions. 630-634 - Anja Eichenauer, Mathias Dietz, Bernd T. Meyer, Tim Jürgens:
Introducing Temporal Rate Coding for Speech in Cochlear Implants: A Microscopic Evaluation in Humans and Models. 635-639 - María Luisa García Lecumberri, Jon Barker, Ricard Marxer, Martin Cooke:
Language Effects in Noise-Induced Word Misperceptions. 640-644 - Léo Varnet, Fanny Meunier, Michel Hoen:
Speech Reductions Cause a De-Weighting of Secondary Acoustic Cues. 645-649 - Lionel Fontan, Isabelle Ferrané, Jérôme Farinas, Julien Pinquier, Xavier Aumont:
Using Phonologically Weighted Levenshtein Distances for the Prediction of Microscopic Intelligibility. 650-654 - Mayuki Matsui:
The Impact of Manner of Articulation on the Intelligibility of Voicing Contrast in Noise: Cross-Linguistic Implications. 655-659 - Michael I. Mandel:
Directly Comparing the Listening Strategies of Humans and Machines. 660-664
Spoken Documents, Spoken Understanding and Semantic Analysis
- Marc-Antoine Rondeau, Yi Su:
LSTM-Based NeuroCRFs for Named Entity Recognition. 665-669 - Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu:
Exploring Word Mover's Distance and Semantic-Aware Embedding Techniques for Extractive Broadcast News Summarization. 670-674 - Imran A. Sheikh, Irina Illina, Dominique Fohr, Georges Linarès:
Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition. 675-679 - Jérémy Trione, Benoît Favre, Frédéric Béchet:
Beyond Utterance Extraction: Summary Recombination for Speech Summarization. 680-684 - Bing Liu, Ian R. Lane:
Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling. 685-689 - Aaron Jaech, Larry P. Heck, Mari Ostendorf:
Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding. 690-694 - Faisal Ladhak, Ankur Gandhe, Markus Dreyer, Lambert Mathias, Ariya Rastrow, Björn Hoffmeister:
LatticeRnn: Recurrent Neural Networks Over Lattices. 695-699 - Santosh Kesiraju, Lukás Burget, Igor Szöke, Jan Cernocký:
Learning Document Representations Using Subspace Multinomial Model. 700-704 - Zhiwei Zhao, Youzheng Wu:
Attention-Based Convolutional Neural Networks for Sentence Classification. 705-709 - Mohamed Morchid, Mohamed Bouaziz, Waad Ben Kheder, Killian Janod, Pierre-Michel Bousquet, Richard Dufour, Georges Linarès:
Spoken Language Understanding in a Latent Topic-Based Subspace. 710-714 - Dilek Hakkani-Tür, Gökhan Tür, Asli Celikyilmaz, Yun-Nung Chen, Jianfeng Gao, Li Deng, Ye-Yi Wang:
Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM. 715-719 - Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori:
Deep Stacked Autoencoders for Spoken Language Understanding. 720-724 - Gakuto Kurata, Bing Xiang, Bowen Zhou:
Labeled Data Generation with Encoder-Decoder LSTM for Semantic Slot Filling. 725-729 - Sabrina Stehwien, Ngoc Thang Vu:
Exploring the Correlation of Pitch Accents and Semantic Slots for Spoken Language Understanding. 730-734 - Yaodong Tang, Zhiyong Wu, Helen M. Meng, Mingxing Xu, Lianhong Cai:
Analysis on Gated Recurrent Unit Based Question Detection Approach. 735-739
Spoken Term Detection
- Shuji Oishi, Tatsuya Matsuba, Mitsuaki Makino, Atsuhiko Kai:
Combining State-Level Spotting and Posterior-Based Acoustic Match for Improved Query-by-Example Spoken Term Detection. 740-744 - Zhiqiang Lv, Meng Cai, Wei-Qiang Zhang, Jia Liu:
A Novel Discriminative Score Calibration Method for Keyword Search. 745-749 - Jorge Proença, Fernando Perdigão:
Segmented Dynamic Time Warping for Spoken Query-by-Example Search. 750-754 - Shi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh:
Generating Complementary Acoustic Model Spaces in DNN-Based Sequence-to-Frame DTW Scheme for Out-of-Vocabulary Spoken Term Detection. 755-759 - Sankaran Panchapagesan, Ming Sun, Aparna Khare, Spyros Matsoukas, Arindam Mandal, Björn Hoffmeister, Shiv Vitaladevuni:
Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting. 760-764 - Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-yi Lee, Lin-Shan Lee:
Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence Autoencoder. 765-769 - Zhong Meng, Biing-Hwang Juang:
Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting. 770-774 - Arseniy Gorin, Rasa Lileikyte, Guangpu Huang, Lori Lamel, Jean-Luc Gauvain, Antoine Laurent:
Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions. 775-779
Show & Tell Session 2
- Lyan Verwimp, Brecht Desplanques, Kris Demuynck, Joris Pelemans, Marieke Lycke, Patrick Wambacq:
STON: Efficient Subtitling in Dutch Using State-of-the-Art Tools. 780-781 - Petr Stanislav, Lubos Smídl, Jan Svec:
An Automatic Training Tool for Air Traffic Control Training. 782-783 - Reima Karhila, Aku Rouhe, Peter Smit, André Mansikkaniemi, Heini Kallio, Erik Lindroos, Raili Hildén, Martti Vainio, Mikko Kurimo:
Digitala: An Augmented Test and Review Process Prototype for High-Stakes Spoken Foreign Language Examination. 784-785 - Géraldine Damnati, Delphine Charlet, Marc Denjean:
Exploring Collections of Multimedia Archives Through Innovative Interfaces in the Context of Digital Humanities. 786-787
Feature Extraction and Acoustic Modeling Using Neural Networks for ASR
- Yougen Yuan, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Learning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair Information. 788-792 - Yuzong Liu, Katrin Kirchhoff:
Novel Front-End Features Based on Neural Graph Embeddings for DNN-HMM and LSTM-CTC Acoustic Modeling. 793-797 - Basil Abraham, Srinivasan Umesh, Neethu Mariam Joy:
Articulatory Feature Extraction Using CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition. 798-802 - Tasha Nagamine, Michael L. Seltzer, Nima Mesgarani:
On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models. 803-807 - Ehsan Variani, Tara N. Sainath, Izhak Shafran, Michiel Bacchiani:
Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling. 808-812 - Tara N. Sainath, Bo Li:
Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks. 813-817
Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge
- Mitchell McLaren, Luciana Ferrer, Diego Castán, Aaron Lawson:
The Speakers in the Wild (SITW) Speaker Recognition Database. 818-822 - Mitchell McLaren, Luciana Ferrer, Diego Castán, Aaron Lawson:
The 2016 Speakers in the Wild Speaker Recognition Evaluation. 823-827 - Ondrej Novotný, Pavel Matejka, Oldrich Plchot, Ondrej Glembek, Lukás Burget, Jan Cernocký:
Analysis of Speaker Recognition Systems in Realistic Scenarios of the SITW 2016 Challenge. 828-832 - Oleg Kudashev, Sergey Novoselov, Konstantin Simonchik, Alexander Kozlov:
A Speaker Recognition System for the SITW Challenge. 833-837 - Houman Ghaemmaghami, Md. Hafizur Rahman, Ivan Himawan, David Dean, Ahilan Kanagasundaram, Sridha Sridharan, Clinton Fookes:
Speakers In The Wild (SITW): The QUT Speaker Recognition System. 838-842 - Abbas Khosravani, Mohammad Mehdi Homayounpour:
AUT System for SITW Speaker Recognition Challenge. 843-847 - Waad Ben Kheder, Moez Ajili, Pierre-Michel Bousquet, Driss Matrouf, Jean-François Bonastre:
LIA System for the SITW Speaker Recognition Challenge. 848-852 - Yi Liu, Yao Tian, Liang He, Jia Liu:
Investigating Various Diarization Algorithms for Speaker in the Wild (SITW) Speaker Recognition Challenge. 853-857
Non-Native Speech Perception
- Odette Scharenborg, Juul Coumans, Sofoklis Kakouros, Roeland van Hout:
Does the Importance of Word-Initial and Word-Final Information Differ in Native versus Non-Native Spoken-Word Recognition? 858-862 - Odette Scharenborg, Elea Kolkman, Sofoklis Kakouros, Brechtje Post:
The Effect of Sentence Accent on Non-Native Speech Perception in Noise. 863-867 - Martin Cooke, María Luisa García Lecumberri:
The Effects of Modified Speech Styles on Intelligibility for Non-Native Listeners. 868-872 - Hao Zhang, Fei Chen, Nan Yan, Lan Wang, Feng Shi, Manwa L. Ng:
The Influence of Language Experience on the Categorical Perception of Vowels: Evidence from Mandarin and Korean. 873-877 - Dominic W. Massaro:
Multiple Influences on Vocabulary Acquisition: Parental Input Dominates. 878-882 - Jian Gong, María Luisa García Lecumberri, Martin Cooke:
Can Intensive Exposure to Foreign Language Sounds Affect the Perception of Native Sounds? 883-887
Behavioral Signal Processing and Speaker State and Traits Analytics
- Nikoletta Bassiou, Andreas Tsiartas, Jennifer Smith, Harry Bratt, Colleen Richey, Elizabeth Shriberg, Cynthia M. D'Angelo, Nonye Alozie:
Privacy-Preserving Speech Analytics for Automatic Assessment of Student Collaboration. 888-892 - Md. Nasir, Brian R. Baucom, Shrikanth S. Narayanan, Panayiotis G. Georgiou:
Complexity in Prosody: A Nonlinear Dynamical Systems Approach for Dyadic Conversations; Behavior and Outcomes in Couples Therapy. 893-897 - Shao-Yen Tseng, Sandeep Nallan Chakravarthula, Brian R. Baucom, Panayiotis G. Georgiou:
Couples Behavior Modeling and Annotation Using Low-Resource LSTM Language Models. 898-902 - Laura Fernández Gallardo, Benjamin Weiss:
Speech Likability and Personality-Based Social Relations: A Round-Robin Analysis over Communication Channels. 903-907 - Bo Xiao, Dogan Can, James Gibson, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Behavioral Coding of Therapist Language in Addiction Counseling Using Recurrent Neural Networks. 908-912 - Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah:
Factor Analysis Based Speaker Normalisation for Continuous Emotion Prediction. 913-917
Spoken Term Detection
- Dhananjay Ram, Afsaneh Asaei, Hervé Bourlard:
Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection. 918-922 - Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection. 923-927 - Amir Hossein Harati Nejad Torbati, Joseph Picone:
A Nonparametric Bayesian Approach for Spoken Term Detection by Example Query. 928-932 - Van Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen, Eng Siong Chng, Haizhou Li:
Rescoring Hypothesized Detections of Out-of-Vocabulary Keywords Using Subword Samples. 933-937 - Yimeng Zhuang, Xuankai Chang, Yanmin Qian, Kai Yu:
Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC. 938-942 - Yen-Chen Wu, Tzu-Hsiang Lin, Yang-De Chen, Hung-yi Lee, Lin-Shan Lee:
Interactive Spoken Content Retrieval by Deep Reinforcement Learning. 943-947
Co-Inference of Production and Acoustics
- Elizabeth Godoy, Andrew Dumas, Jennifer Melot, Nicolas Malyska, Thomas F. Quatieri:
Relating Estimated Cyclic Spectral Peak Frequency to Measured Epilarynx Length Using Magnetic Resonance Imaging. 948-952 - Patrick Lumban Tobing, Tomoki Toda, Hirokazu Kameoka, Satoshi Nakamura:
Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model. 953-957 - Yehoshua Dissen, Joseph Keshet:
Formant Estimation and Tracking Using Deep Learning. 958-962 - Colin Vaz, Asterios Toutios, Shrikanth S. Narayanan:
Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series Data. 963-967 - Lauri Juvela, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku:
Majorisation-Minimisation Based Optimisation of the Composite Autoregressive System with Application to Glottal Inverse Filtering. 968-972 - Xiaoyun Wang, Xugang Lu, Hisashi Kawai, Seiichi Yamamoto:
F0 Contour Analysis Based on Empirical Mode Decomposition for DNN Acoustic Modeling in Mandarin Speech Recognition. 973-977
Acoustic and Articulatory Phonetics
- Fang Hu, Chunyu Ge:
Vowels and Diphthongs in Cangnan Southern Min Chinese Dialect. 978-982 - Wenqi Hu, Fang Hu, Jian Jin:
Diphthongization of Nuclear Vowels and the Emergence of a Tetraphthong in Hetang Cantonese. 983-987 - Milos Cernak, Philip N. Garner:
PhonVoc: A Phonetic and Phonological Vocoding Toolkit. 988-992 - Liping Xia, Fang Hu:
Vowels and Diphthongs in the Taiyuan Jin Chinese Dialect. 993-997 - Giuseppina Turco, Cécile Fougeron, Nicolas Audibert:
The Effects of Prosody on French V-to-V Coarticulation: A Corpus-Based Study. 998-1001 - Vincenzo Galatà, Lorenzo Spreafico, Alessandro Vietti, Constantijn Kaland:
An Acoustic Analysis of /r/ in Tyrolean. 1002-1006 - Seung-Eun Chang, Minsook Kim:
Hyperarticulated Production of Korean Glides by Age Group. 1007-1010 - Ho-hsien Pan, Hsiao-tung Huang, Shao-Ren Lyu:
Coda Stop and Taiwan Min Checked Tone Sound Changes. 1011-1015
Prosody, Phonation and Voice Quality
- Sarah E. Fenwick, Catherine T. Best, Chris Davis, Michael D. Tyler:
The Influence of Modality and Speaking Style on the Assimilation Type and Categorization Consistency of Non-Native Speech. 1016-1020 - Margaret Zellers:
Prosodic Convergence with Spoken Stimuli in Laboratory Data. 1021-1025 - Charalambos Themistocleous, Angelandria Savva, Andrie Aristodemou:
Effects of Stress on Fricatives: Evidence from Standard Modern Greek. 1026-1029 - Yue Sun, Shudon Hsiao, Yoshinori Sagisaka, Jinsong Zhang:
Analysis of Chinese Syllable Durations in Running Speech of Japanese L2 Learners. 1030-1033 - Catherine Lai, Mireia Farrús, Johanna D. Moore:
Automatic Paragraph Segmentation with Lexical and Prosodic Features. 1034-1038 - Manu Airaksinen, Lauri Juvela, Tom Bäckström, Paavo Alku:
Automatic Glottal Inverse Filtering with Non-Negative Matrix Factorization. 1039-1043 - Soo Jin Park, Caroline Sigouin, Jody Kreiman, Patricia A. Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, Abeer Alwan:
Speaker Identity and Voice Quality: Modeling Human Responses and Automatic Speaker Recognition. 1044-1048 - Sishir Kalita, Luke Horo, Priyankoo Sarmah, S. R. Mahadeva Prasanna, Samarendra Dandapat:
Analysis of Glottal Stop in Assam Sora Language. 1049-1053 - Marc Garellek, Scott Seyfarth:
Acoustic Differences Between English /t/ Glottalization and Phrasal Creak. 1054-1058 - Anders Eriksson, Pier Marco Bertinetto, Mattias Heldner, Rosalba Nodari, Giovanna Lenoci:
The Acoustics of Lexical Stress in Italian as a Function of Stress Level and Speaking Style. 1059-1063 - Antje Schweitzer, Ngoc Thang Vu:
Cross-Gender and Cross-Dialect Tone Recognition for Vietnamese. 1064-1068 - Karthika Vijayan, K. Sri Rama Murty:
Prosody Modification Using Allpass Residual of Speech Signals. 1069-1073 - Sofoklis Kakouros, Joris Pelemans, Lyan Verwimp, Patrick Wambacq, Okko Räsänen:
Analyzing the Contribution of Top-Down Lexical and Bottom-Up Acoustic Cues in the Detection of Sentence Prominence. 1074-1078 - Jeffrey Kallay, Melissa A. Redford:
A Longitudinal Study of Children's Intonation in Narrative Speech. 1079-1083
Speech Production Analysis and Modeling
- Reed Blaylock, Louis Goldstein, Shrikanth S. Narayanan:
Velum Control for Oral Sounds. 1084-1088 - Gayeon Son:
F0 Development in Acquiring Korean Stop Distinction. 1089-1093 - Clara Cohen, Matt Carlson:
Phonetic Reduction Can Lead to Lengthening, and Enhancement Can Lead to Shortening. 1094-1098 - Takayuki Arai:
Mechanical Production of [b], [m] and [w] Using Controlled Labial and Velopharyngeal Gestures. 1099-1103 - Qiang Fang, Yun Chen, Haibo Wang, Jianguo Wei, Jianrong Wang, Xiyu Wu, Aijun Li:
An Improved 3D Geometric Tongue Model. 1104-1107 - Mikko Tiainen, Fatima M. Felisberti, Kaisa Tiippana, Martti Vainio, Juraj Simko, Jirí Lukavský, Lari Vainio:
Congruency Effect Between Articulation and Grasping in Native English Speakers. 1108-1112 - Shamima Najnin, Bonny Banerjee:
Emergence of Vocal Developmental Sequences in a Predictive Coding Model of Speech Acquisition. 1113-1117 - Julien Meyer, Laure Dentel, Fanny Meunier:
Categorization of Natural Spanish Whistled Vowels by Naïve Spanish Listeners. 1118-1121 - Rob Voigt, Dan Jurafsky, Meghan Sumner:
Between- and Within-Speaker Effects of Bilingualism on F0 Variation. 1122-1126 - Calbert Graham, Paula Buttery, Francis Nolan:
Vowel Characteristics in the Assessment of L2 English Pronunciation. 1127-1131 - Ahmed Geneid, Anne-Maria Laukkanen, Anita McAllister, Robert Eklund:
Kulning (Swedish Cattle Calls): Acoustic, EGG, Stroboscopic and High-Speed Video Analyses of an Unusual Singing Style. 1132-1135 - Mísa Hejná, Pertti Palo, Scott Moisik:
Glottal Squeaks in VC Sequences. 1136-1140 - Naoya Takahashi, Tofigh Naghibi, Beat Pfister:
Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks. 1141-1145
Spoken Dialogue Systems
- Xiaohu Liu, Ruhi Sarikaya, Liang Zhao, Yong Ni, Yi-Cheng Pan:
Personalized Natural Language Understanding. 1146-1150 - Layla El Asri, Jing He, Kaheer Suleman:
A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems. 1151-1155 - Spiros Georgiladakis, Georgia Athanasopoulou, Raveesh Meena, José Lopes, Arodami Chorianopoulou, Elisavet Palogiannidi, Elias Iosif, Gabriel Skantze, Alexandros Potamianos:
Root Cause Analysis of Miscommunication Hotspots in Spoken Dialogue Systems. 1156-1160 - Omar Zia Khan, Ruhi Sarikaya:
Making Personal Digital Assistants Aware of What They Do Not Know. 1161-1165 - Rivka Levitan, Stefan Benus, Ramiro H. Gálvez, Agustín Gravano, Florencia Savoretti, Marián Trnka, Andreas Weise, Julia Hirschberg:
Implementing Acoustic-Prosodic Entrainment in a Conversational Avatar. 1166-1170 - Annika Silvervarg, Sofia Lindvall, Jonatan Andersson, Ida Esberg, Christian Jernberg, Filip Frumerie, Arne Jönsson:
Perceived Usability and Cognitive Demand of Secondary Tasks in Spoken Versus Visual-Manual Automotive Interaction. 1171-1175
Show & Tell Session 3
- Pascale Fung, Anik Dey, Farhad Bin Siddique, Ruixi Lin, Yang Yang, Yan Wan, Ricky Ho Yin Chan:
Zara: An Empathetic Interactive Virtual Agent. 1176-1177 - Cristian Tejedor García, David Escudero Mancebo, Enrique Cámara Arenas, César González Ferreras, Valentín Cardeñoso-Payo:
Measuring Pronunciation Improvement in Users of CAPT Tool TipTopTalk! 1178-1179 - Hideki Kawahara:
SparkNG: Interactive MATLAB Tools for Introduction to Speech Production, Perception and Processing Fundamentals and Application of the Aliasing-Free L-F Model Component. 1180-1181 - Erik Marchi, Florian Eyben, Gerhard Hagerer, Björn W. Schuller:
Real-Time Tracking of Speakers' Emotions, States, and Traits on Mobile Platforms. 1182-1183
Special Event: Mindfulness
- Nikki Mirghafori:
Mindfulness Special Event.
Keynote 2: Edward Chang
- Edward Chang:
The Human Speech Cortex. 1184
Special Event: Speaker Comparison for Forensic and Investigative Applications II
- Jean-François Bonastre, Joseph P. Campbell, Anders P. Eriksson, Hirotaka Nakasone, Reva Schwartz:
Speaker Comparison for Forensic and Investigative Applications II.
Special Session: Clinical and Neuroscience-Inspired Vocal Biomarkers of Neurological and Psychiatric Disorders
- Daniel Bone, Somer Bishop, Rahul Gupta, Sungbok Lee, Shrikanth S. Narayanan:
Acoustic-Prosodic and Turn-Taking Features in Interactions with Children with Neurodevelopmental Disorders. 1185-1189 - Daria Hemmerling, Juan Rafael Orozco-Arroyave, Andrzej Skalski, Janusz Gajda, Elmar Nöth:
Automatic Detection of Parkinson's Disease Based on Modulated Vowels. 1190-1194 - Jun Wang, Prasanna V. Kothalkar, Beiming Cao, Daragh Heitzman:
Towards Automatic Detection of Amyotrophic Lateral Sclerosis from Speech Acoustic and Articulatory Samples. 1195-1199 - Gregory A. Ciccarelli, Thomas F. Quatieri, Satrajit S. Ghosh:
Neurophysiological Vocal Source Modeling for Biomarkers of Disease. 1200-1204 - Rachelle L. Horwitz-Martin, Thomas F. Quatieri, Adam C. Lammert, James R. Williamson, Yana Yunusova, Elizabeth Godoy, Daryush D. Mehta, Jordan R. Green:
Relation of Automatically Extracted Formant Trajectories with Intelligibility Loss and Speaking Rate Decline in Amyotrophic Lateral Sclerosis. 1205-1209 - Fabien Ringeval, Erik Marchi, Charline Grossard, Jean Xavier, Mohamed Chetouani, David Cohen, Björn W. Schuller:
Automatic Analysis of Typical and Atypical Encoding of Spontaneous Emotion in the Voice of Children. 1210-1214 - Soheil Khorram, John Gideon, Melvin G. McInnis, Emily Mower Provost:
Recognition of Depression in Bipolar Disorder: Leveraging Cohort and Person-Specific Knowledge. 1215-1219 - Bahman Mirheidari, Daniel Blackburn, Markus Reuber, Traci Walker, Heidi Christensen:
Diagnosing People with Dementia Using Automatic Conversation Analysis. 1220-1224
Special Session: Singing Synthesis Challenge: Fill-In the Gap
- Paul Yaozhu Chan, Minghui Dong, Grace Xue Hui Ho, Haizhou Li:
SERAPHIM: A Wavetable Synthesis System with 3D Lip Animation for Real-Time Speech and Singing Applications on Mobile Platforms. 1225-1229 - Jordi Bonada, Martí Umbert, Merlijn Blaauw:
Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016. 1230-1234 - Olivier Perrotin, Christophe d'Alessandro:
Vocal Effort Modification for Singing Synthesis. 1235-1239 - Eder del Blanco, Inma Hernáez, Eva Navas, Xabier Sarasola, Daniel Erro:
Bertsokantari: a TTS Based Singing Synthesis System. 1240-1244 - Lionel Feugère, Christophe d'Alessandro, Samuel Delalez, Luc Ardaillon, Axel Roebel:
Evaluation of Singing Synthesis: Methodology and Case Study with Concatenative and Performative Systems. 1245-1249 - Luc Ardaillon, Celine Chabot-Canet, Axel Roebel:
Expressive Control of Singing Voice Synthesis Using Musical Contexts and a Parametric F0 Model. 1250-1254 - Marius Cotescu:
Optimal Unit Stitching in a Unit Selection Singing Synthesis System. 1255-1259
Conversation and Interaction
- Katherine Hilton:
The Perception of Overlapping Speech: Effects of Speaker Prosody and Listener Attitudes. 1260-1264 - Agustín Gravano, Pablo Brusco, Stefan Benus:
Who Do You Think Will Speak Next? Perception of Turn-Taking Cues in Slovak and Argentine Spanish. 1265-1269 - Juan Manuel Pérez, Ramiro H. Gálvez, Agustín Gravano:
Disentrainment may be a Positive Thing: A Novel Measure of Unsigned Acoustic-Prosodic Synchrony, and its Relation to Speaker Engagement. 1270-1274 - Marcin Wlodarczak, Mattias Heldner:
Respiratory Turn-Taking Cues. 1275-1279 - Emma Rennie, Rebecca Lunsford, Peter A. Heeman:
The Discourse Marker "so" in Turn-Taking and Turn-Releasing Behavior. 1280-1284 - Ethan Sherr-Ziarko:
Acoustic Properties of Formality in Conversational Japanese. 1285-1289
Automatic Learning of Representations
- Thomas Pellegrini, Sandrine Mouysset:
Inferring Phonemic Classes from CNN Activation Maps Using Clustering Techniques. 1290-1294 - Neil Zeghidour, Gabriel Synnaeve, Nicolas Usunier, Emmanuel Dupoux:
Joint Learning of Speaker and Phonetic Similarities with Siamese Networks. 1295-1299 - Vikramjit Mitra, Dimitra Vergyri, Horacio Franco:
Unsupervised Learning of Acoustic Units Using Autoencoders and Kohonen Nets. 1300-1304 - Zhenyao Zhu, Jesse H. Engel, Awni Y. Hannun:
Learning Multiscale Features Directly from Waveforms. 1305-1309 - Michael Heck, Sakriani Sakti, Satoshi Nakamura:
Supervised Learning of Acoustic Models in a Zero Resource Setting to Improve DPGMM Clustering. 1310-1314 - Haihua Xu, Hang Su, Chongjia Ni, Xiong Xiao, Hao Huang, Eng Siong Chng, Haizhou Li:
Semi-Supervised and Cross-Lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models Under Low-Resource Conditions. 1315-1319
Language Modeling for Conversational Speech and Confidence Measures
- Taichi Asami, Ryo Masumura, Yushi Aono, Koichi Shinoda:
Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features. 1320-1324 - Naoyuki Kanda, Shoji Harada, Xugang Lu, Hisashi Kawai:
Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks. 1325-1329 - Sahar Ghannay, Yannick Estève, Nathalie Camelin, Paul Deléglise:
Acoustic Word Embeddings for ASR Error Detection. 1330-1334 - Axel Horndasch, Anton Batliner, Caroline Kaufhold, Elmar Nöth:
Combining Semantic Word Classes and Sub-Word Unit Speech Recognition for Robust OOV Detection. 1335-1339 - Chuandong Xie, Wu Guo, Guoping Hu, Junhua Liu:
Web Data Selection Based on Word Embedding for Low-Resource Speech Recognition. 1340-1344
Topics in Speech Perception
- Jianjing Kuang, Mark Liberman:
Pitch-Range Perception: The Dynamic Interaction Between Voice Quality and Fundamental Frequency. 1350-1354 - Fei Chen, Benson C. L. Chiao:
Comparing the Contributions of Amplitude and Phase to Speech Intelligibility in a Vocoder-Based Speech Synthesis Model. 1355-1358 - Fei Chen:
Modeling Noise Influence to Speech Intelligibility Non-Intrusively by Reduced Speech Dynamic Range. 1359-1362 - Gábor Pintér, Hiroki Watanabe:
Do GMM Phoneme Classifiers Perceive Synthetic Sibilants as Humans Do? 1363-1367 - Marina Frye, Cristiano Micheli, Inga M. Schepers, Gerwin Schalk, Jochem W. Rieger, Bernd T. Meyer:
Neural Responses to Speech-Specific Modulations Derived from a Spectro-Temporal Filter Bank. 1368-1372 - Kimberley Mulder, Louis ten Bosch, Lou Boves:
Comparing Different Methods for Analyzing ERP Signals. 1373-1377 - Robert Eklund, Martin Ingvar:
Supplementary Motor Area Activation in Disfluency Perception: An fMRI Study of Listener Neural Responses to Spontaneously Produced Unfilled and Filled Pauses. 1378-1381 - Daniel Fogerty, Fei Chen:
Vowel Fundamental and Formant Frequency Contributions to English and Mandarin Sentence Intelligibility. 1382-1386
Behavioral Signal Processing and Speaker State and Traits Analytics
- Che-Wei Huang, Shrikanth S. Narayanan:
Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion Recognition. 1387-1391 - Linchuan Li, Zhiyong Wu, Mingxing Xu, Helen M. Meng, Lianhong Cai:
Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition. 1392-1396 - Jürgen Trouvain, Zofia Malisz:
Inter-Speech Clicks in an Interspeech Keynote. 1397-1401 - Joanna Grzybowska, Stanislaw Kacprzak:
Speaker Age Classification and Regression Using i-Vectors. 1402-1406 - Haoqi Li, Brian R. Baucom, Panayiotis G. Georgiou:
Sparsely Connected and Disjointly Trained Deep Neural Networks for Low Resource Behavioral Annotation: Acoustic Classification in Couples' Therapy. 1407-1411 - Guozhen An, Sarah Ita Levitan, Rivka Levitan, Andrew Rosenberg, Michelle Levine, Julia Hirschberg:
Automatically Classifying Self-Rated Personality Scores from Speech. 1412-1416 - Jill Fain Lehman, Rita Singh:
Estimation of Children's Physical Characteristics from Their Voices. 1417-1421 - Hayakawa Akira, Saturnino Luz, Nick Campbell:
Talking to a System and Talking to a Human: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task. 1422-1426 - Rahul Gupta, Shrikanth S. Narayanan:
Predicting Affective Dimensions Based on Self Assessed Depression Severity. 1427-1431 - Wen-Yu Huang, Shan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Chi-Chun Lee:
Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech Information. 1432-1436 - Sri Harsha Dumpala, P. Gangamohan, Suryakanth V. Gangashetty, B. Yegnanarayana:
Use of Vowels in Discriminating Speech-Laugh from Laughter and Neutral Speech. 1437-1441 - Kan Kawabata, Visar Berisha, Anna Scaglione, Amy LaCross:
A Convex Model for Linguistic Influence in Group Conversations. 1442-1446 - James Gibson, Dogan Can, Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
A Deep Learning Approach to Modeling Empathy in Addiction Counseling. 1447-1451 - Kun-Yi Huang, Chung-Hsien Wu, Yu-Ting Kuo, Fong-Lin Jang:
Unipolar Depression vs. Bipolar Disorder: An Elicitation-Based Approach to Short-Term Detection of Mood Disorder. 1452-1456
Speech Synthesis Poster
- Abir Masmoudi, Mariem Ellouze, Fethi Bougares, Yannick Estève, Lamia Hadrich Belguith:
Conditional Random Fields for the Tunisian Dialect Grapheme-to-Phoneme Conversion. 1457-1461 - Sittipong Saychum, Sarawoot Kongyoung, Anocha Rugchatjaroen, Patcharika Chootrakool, Sawit Kasuriya, Chai Wutiwiwatchai:
Efficient Thai Grapheme-to-Phoneme Conversion Using CRF-Based Joint Sequence Modeling. 1462-1466 - Aurore Jaumard-Hakoun, Kele Xu, Clémence Leboullenger, Pierre Roussel-Ragot, Bruce Denby:
An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging. 1467-1471 - Xu Li, Zhiyong Wu, Helen M. Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai:
Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis. 1472-1476 - Xu Li, Zhiyong Wu, Helen M. Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai:
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data. 1477-1481 - Sarah Taylor, Akihiro Kato, Iain A. Matthews, Ben P. Milner:
Audio-to-Visual Speech Conversion Using Deep Neural Networks. 1482-1486 - Toru Nakashika, Yasuhiro Minami:
Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine. 1487-1491 - Asterios Toutios, Tanner Sorensen, Krishna Somandepalli, Rachel Alexander, Shrikanth S. Narayanan:
Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data. 1492-1496 - Xurong Xie, Xunying Liu, Lan Wang:
Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information. 1497-1501 - Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai:
Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks. 1502-1506 - Christopher Liberatore, Ricardo Gutierrez-Osuna:
Generating Gestural Scores from Acoustics Through a Sparse Anchor-Based Representation of Speech. 1507-1511 - David Guennec, Damien Lolive:
On the Suitability of Vocalic Sandwiches in a Corpus-Based TTS Engine. 1512-1516 - Decha Moungsri, Tomoki Koriyama, Takao Kobayashi:
Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis. 1517-1521 - Jinfu Ni, Yoshinori Shiga, Hisashi Kawai:
Using Zero-Frequency Resonator to Extract Multilingual Intonation Structure. 1522-1526
Resources and Annotation of Resources
- Jia Yu, Xiong Xiao, Lei Xie, Eng Siong Chng, Haizhou Li:
A DNN-HMM Approach to Story Segmentation. 1527-1531 - Jean-Philippe Goldman, Pierre-Edouard Honnet, Robert A. J. Clark, Philip N. Garner, Maria Ivanova, Alexandros Lazaridis, Hui Liang, Tiago Macedo, Beat Pfister, Manuel Sam Ribeiro, Eric Wehrli, Junichi Yamagishi:
The SIWIS Database: A Multilingual Speech Database with Acted Emphasis. 1532-1535 - Emre Yilmaz, Henk van den Heuvel, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, David A. van Leeuwen:
Open Source Speech and Language Resources for Frisian. 1536-1540 - Andreas Kathol, Elizabeth Shriberg, Massimiliano de Zambotti:
The SRI CLEO Speaker-State Corpus. 1541-1544 - Nancy F. Chen, Rong Tong, Darren Wee, Pei Xuan Lee, Bin Ma, Haizhou Li:
SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese. 1545-1549 - Colleen Richey, Cynthia M. D'Angelo, Nonye Alozie, Harry Bratt, Elizabeth Shriberg:
The SRI Speech-Based Collaborative Learning Corpus. 1550-1554 - Anil Ramakrishna, Rahul Gupta, Ruth B. Grossman, Shrikanth S. Narayanan:
An Expectation Maximization Approach to Joint Modeling of Multidimensional Ratings Derived from Multiple Annotators. 1555-1559 - Jindrich Matousek, Daniel Tihelka:
Voting Detector: A Combination of Anomaly Detectors to Reveal Annotation Errors in TTS Corpora. 1560-1564
Show & Tell Session 4
- Mario Corrales-Astorgano, David Escudero Mancebo, César González Ferreras, Yurena Gutiérrez-González, Valle Flores-Lucas, Valentín Cardeñoso-Payo, Lourdes Aguilar-Cuevas:
The Magic Stone: A Video Game to Improve Communication Skills of People with Intellectual Disabilities. 1565-1566 - Finnian Kelly, Anil Alexander, Oscar Forth, Samuel Kent, Jonas Lindh, Joel Åkesson:
Identifying Perceptually Similar Voices with a Speaker Recognition System Using Auto-Phonetic Features. 1567-1568 - Kristy James, Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer:
A Real-Time Framework for Visual Feedback of Articulatory Data Using Statistical Shape Models. 1569-1570 - Alex Marin, Paul A. Crook, Omar Zia Khan, Vasiliy Radostev, Khushboo Aggarwal, Ruhi Sarikaya:
Flexible, Rapid Authoring of Goal-Orientated, Multi-Turn Dialogues Using the Task Completion Platform. 1571-1572
Acoustic Model Adaptation
- Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Takuya Yoshioka, Dung T. Tran, Tomohiro Nakatani:
Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models. 1573-1577 - Boon Pang Lim, Faith Wong, Yuyao Li, Jia Wei Bay:
Transfer Learning with Bottleneck Feature Networks for Whispered Speech Recognition. 1578-1582 - Tasha Nagamine, Zhuo Chen, Nima Mesgarani:
Adaptation of Neural Networks Constrained by Prior Statistics of Node Co-Activations. 1583-1587 - Masayuki Suzuki, Ryuki Tachibana, Samuel Thomas, Bhuvana Ramabhadran, George Saon:
Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings. 1588-1592 - Lahiru Samarakoon, Khe Chai Sim:
Subspace LHUC for Fast Adaptation of Deep Neural Network Acoustic Models. 1593-1597 - Joachim Fainberg, Peter Bell, Mike Lincoln, Steve Renals:
Improving Children's Speech Recognition Through Out-of-Domain Data Augmentation. 1598-1602
Special Session: Sharing Research and Education Resources for Understanding Speech Processing
- Florian Metze, Eric Riebling, Anne S. Warlaumont, Elika Bergelson:
Virtual Machines and Containers as a Platform for Experimentation. 1603-1607 - Phil D. Green, Ricard Marxer, Stuart P. Cunningham, Heidi Christensen, Frank Rudzicz, Maria Yancheva, André Coy, Massimiliano Malavasi, Lorenzo Desideri, Fabio Tamburini:
CloudCAST - Remote Speech Technology for Speech Professionals. 1608-1612 - Thomas Hain, Jeremy Christian, Oscar Saz, Salil Deena, Madina Hasan, Raymond W. M. Ng, Rosanna Milner, Mortaza Doulaty, Yulan Liu:
webASR 2 - Improved Cloud Based Speech Technology. 1613-1617 - Andrew R. Plummer, Mary E. Beckman:
Sharing Speech Synthesis Software for Research and Education Within Low-Tech and Low-Resource Communities. 1618-1622 - Ronald L. Sprouse, Keith Johnson:
The Berkeley Phonetics Machine. 1623-1626 - Rebecca Bates, Eric Fosler-Lussier, Florian Metze, Martha A. Larson, Gina-Anne Levow, Emily Mower Provost:
Experiences with Shared Resources for Research and Education in Speech and Language Processing. 1627-1631
Special Session: Voice Conversion Challenge
- Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, Junichi Yamagishi:
The Voice Conversion Challenge 2016. 1632-1636 - Mirjam Wester, Zhizheng Wu, Junichi Yamagishi:
Analysis of the Voice Conversion Challenge 2016 Evaluation Results. 1637-1641 - Ling-Hui Chen, Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Li-Rong Dai:
The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion. 1642-1646 - Seyed Hamidreza Mohammadi, Alexander Kain:
A Voice Conversion Mapping Function Based on a Stacked Joint-Autoencoder. 1647-1651 - Yi-Chiao Wu, Hsin-Te Hwang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang:
Locally Linear Embedding for Exemplar-Based Spectral Conversion. 1652-1656 - Fernando Villavicencio, Junichi Yamagishi, Jordi Bonada, Felipe Espic:
Applying Spectral Normalisation and Efficient Envelope Estimation and Statistical Transformation for the Voice Conversion Challenge 2016. 1657-1661 - Daniel Erro, Agustín Alonso, Luis Serrano, David Tavarez, Igor Odriozola, Xabier Sarasola, Eder del Blanco, Jon Sánchez, Ibon Saratxaga, Eva Navas, Inma Hernáez:
ML Parameter Generation with a Reformulated MGE Training Criterion - Participation in the Voice Conversion Challenge 2016. 1662-1666 - Kazuhiro Kobayashi, Shinnosuke Takamichi, Satoshi Nakamura, Tomoki Toda:
The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016. 1667-1671
Intelligibility and Masking
- Maury Lander-Portnoy:
Release from Energetic Masking Caused by Repeated Patterns of Glimpsing Windows. 1672-1676 - Bobby Gibbs II, Daniel Fogerty:
Glimpsing Predictions for Natural and Vocoded Sentence Intelligibility During Modulation Masking: Effect of the Glimpse Cutoff Criterion. 1677-1681 - Li Xu:
Temporal Envelopes in Sine-Wave Speech Recognition. 1682-1686 - Jing Liu, Rosanna H. N. Tong, Fei Chen:
Understanding Periodically Interrupted Mandarin Speech. 1687-1691 - Fei Chen, Daniel Fogerty:
Factors Affecting the Intelligibility of Sine-Wave Speech. 1692-1695 - Nao Hodoshima:
Effects of Urgent Speech and Preceding Sounds on Speech Intelligibility in Noisy and Reverberant Environments. 1696-1699
Robust Speaker Recognition and Anti-Spoofing
- Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Hong Yu, Tomi Kinnunen, Nicholas W. D. Evans, Zheng-Hua Tan:
Integrated Spoofing Countermeasures and Automatic Speaker Verification: An Evaluation on ASVspoof 2015. 1700-1704 - Pavel Korshunov, Sébastien Marcel:
Cross-Database Evaluation of Audio-Based Spoofing Detection Systems. 1705-1709 - Kaavya Sriskandaraja, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah:
Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech. 1710-1714 - Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li:
An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions. 1715-1719 - Md. Sahidullah, Rosa González Hautamäki, Dennis Alexander Lehmann Thomsen, Tomi Kinnunen, Zheng-Hua Tan, Ville Hautamäki, Robert Parts, Martti Pitkänen:
Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech. 1720-1724 - Zhong Meng, Biing-Hwang Juang:
Statistical Modeling of Speaker's Voice with Temporal Co-Location for Active Voice Authentication. 1725-1729
Speech Enhancement and Applications
- Johannes Fischer, Tom Bäckström:
Joint Enhancement and Coding of Speech by Incorporating Wiener Filtering in a CELP Codec. 1730-1734 - Hong Liu, Xiuling Wang, Miao Sun, Cheng Pang:
Multi-Channel Linear Prediction Based on Binaural Coherence for Speech Dereverberation. 1735-1739 - Martin Blass, Pejman Mowlaee, W. Bastiaan Kleijn:
Single-Channel Speech Enhancement Using Double Spectrum. 1740-1744 - Lukas Drude, Bhiksha Raj, Reinhold Haeb-Umbach:
On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement. 1745-1749 - Steffen Zeiler, Hendrik Meutzner, Ahmed Hussen Abdelaziz, Dorothea Kolossa:
Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement. 1750-1754 - Constantin Spille, Hendrik Kayser, Hynek Hermansky, Bernd T. Meyer:
Assessing Speech Quality in Speech-Aware Hearing Aids Based on Phoneme Posteriorgrams. 1755-1759
Speech Analysis
- Dhananjaya N. Gowda, Paavo Alku:
Time-Varying Quasi-Closed-Phase Weighted Linear Prediction Analysis of Speech for Accurate Formant Detection and Tracking. 1760-1764 - Yongwan Lim, Sajan Goud Lingala, Asterios Toutios, Shrikanth S. Narayanan, Krishna S. Nayak:
Improved Depiction of Tissue Boundaries in Vocal Tract Real-Time MRI Using Automatic Off-Resonance Correction. 1765-1769 - Merlijn Blaauw, Jordi Bonada:
Modeling and Transforming Speech Using Variational Autoencoders. 1770-1774 - Chandra Sekhar Seelamantula:
Phase-Encoded Speech Spectrograms. 1775-1779 - Peter Birkholz, Petko Bakardjiev, Steffen Kürbis, Rico Petrick:
Towards Minimally Invasive Velar State Detection in Normal and Silent Speech. 1780-1784 - Jianshu Zhang, Jian Tang, Li-Rong Dai:
RNN-BLSTM Based Multi-Pitch Estimation. 1785-1789 - Masanori Morise, Hideki Kawahara:
TUSK: A Framework for Overviewing the Performance of F0 Estimators. 1790-1794 - Pradeep Rengaswamy, Gurunath Reddy M., K. Sreenivasa Rao, Pallab Dasgupta:
A Robust Non-Parametric and Filtering Based Approach for Glottal Closure Instant Detection. 1795-1799
Speaker Recognition
- Rahim Saeidi, Ilkka Huhtakallio, Paavo Alku:
Analysis of Face Mask Effect on Speaker Recognition. 1800-1804 - Elliot Singer, Tyler Campbell, Douglas A. Reynolds:
Data Selection for Within-Class Covariance Estimation. 1805-1809 - Marc Ferras, Srikanth R. Madikeri, Subhadeep Dey, Petr Motlícek, Hervé Bourlard:
Inter-Task System Fusion for Speaker Recognition. 1810-1814 - Zhenchun Lei, Yanhong Wan, Jian Luo, Yingen Yang:
Mahalanobis Metric Scoring Learned from Weighted Pairwise Constraints in I-Vector Speaker Recognition System. 1815-1819 - Meet H. Soni, Tanvina B. Patel, Hemant A. Patil:
Novel Subband Autoencoder Features for Detection of Spoofed Speech. 1820-1824 - Mitchell McLaren, Diego Castán, Luciana Ferrer, Aaron Lawson:
On the Issue of Calibration in DNN-Based Speaker Recognition Systems. 1825-1829 - Waad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre:
Probabilistic Approach Using Joint Long and Short Session i-Vectors Modeling to Deal with Short Utterances for Speaker Recognition. 1830-1834 - Ahilan Kanagasundaram, David Dean, Sridha Sridharan, Clinton Fookes, Ivan Himawan:
Short Utterance Variance Modelling and Utterance Partitioning for PLDA Speaker Verification. 1835-1838 - Nicolai Bæk Thomsen, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Børge Lindberg, Søren Holdt Jensen:
Speaker-Dependent Dictionary-Based Speech Enhancement for Text-Dependent Speaker Verification. 1839-1843 - Chengzhu Yu, Chunlei Zhang, Finnian Kelly, Abhijeet Sangwan, John H. L. Hansen:
Text-Available Speaker Recognition System for Forensic Applications. 1844-1847 - Qingyang Hong, Lin Li, Lihong Wan, Jun Zhang, Feng Tong:
Transfer Learning for Speaker Verification on Short Utterances. 1848-1852 - Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee:
Twin Model G-PLDA for Duration Mismatch Compensation in Text-Independent Speaker Verification. 1853-1857 - Xiao-Lei Zhang:
Universal Background Sparse Coding and Multilayer Bootstrap Network for Speaker Clustering. 1858-1862 - Yao Tian, Meng Cai, Liang He, Wei-Qiang Zhang, Jia Liu:
Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data. 1863-1867
Decoding, System Combination
- Naoyuki Kanda, Xugang Lu, Hisashi Kawai:
Maximum a posteriori Based Decoding for CTC Acoustic Models. 1868-1872 - Afsaneh Asaei, Gil Luyet, Milos Cernak, Hervé Bourlard:
Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures. 1873-1877 - George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, Shiv Vitaladevuni:
Model Compression Applied to Small-Footprint Keyword Spotting. 1878-1882 - Angel Mario Castro Martinez, Marc René Schädler:
Why do ASR Systems Despite Neural Nets Still Depend on Robust Features. 1883-1887 - Qing He, Gregory W. Wornell, Wei Ma:
An Adaptive Multi-Band System for Low Power Voice Command Recognition. 1888-1892 - Michael Price, Anantha P. Chandrakasan, James R. Glass:
Memory-Efficient Modeling and Search Techniques for Hardware ASR Decoders. 1893-1897 - Jingzhou Yang, Anton Ragni, Mark J. F. Gales, Kate M. Knill:
Log-Linear System Combination Using Structured Support Vector Machines. 1898-1902 - Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu:
Efficient Segmental Cascades for Speech Recognition. 1903-1907 - Sirui Xu, Eric Fosler-Lussier:
A WFST Framework for Single-Pass Multi-Stream Decoding. 1908-1912 - William Hartmann, Le Zhang, Kerri Barnes, Roger Hsiao, Stavros Tsakalidis, Richard M. Schwartz:
Comparison of Multiple System Combination Techniques for Keyword Spotting. 1913-1917 - Masato Obara, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh:
Rescoring by Combination of Posteriorgram Score and Subword-Matching Score for Use in Query-by-Example. 1918-1922 - Zhehuai Chen, Wei Deng, Tao Xu, Kai Yu:
Phone Synchronous Decoding with CTC Lattice. 1923-1927
Special Session: Clinical and Neuroscience-Inspired Vocal Biomarkers of Neurological and Psychiatric Disorders
- Saurabh Sahu, Carol Y. Espy-Wilson:
Speech Features for Depression Detection. 1928-1932 - Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Jesús Francisco Vargas-Bonilla, Elmar Nöth:
Parkinson's Disease Progression Assessment from Speech Using GMM-UBM. 1933-1937 - Jochen Weiner, Christian Herff, Tanja Schultz:
Speech-Based Detection of Alzheimer's Disease in Conversational German. 1938-1942 - Sharifa Alghowinem, Roland Goecke, Julien Epps, Michael Wagner, Jeffrey F. Cohn:
Cross-Cultural Depression Recognition from Vocal Biomarkers. 1943-1947 - Luke Zhou, Kathleen C. Fraser, Frank Rudzicz:
Speech Recognition in Alzheimer's Disease and in its Assessment. 1948-1952 - Florian B. Pokorny, Peter B. Marschik, Christa Einspieler, Björn W. Schuller:
Does She Speak RTT? Towards an Earlier Identification of Rett Syndrome Through Intelligent Pre-Linguistic Vocalisation Analysis. 1953-1957 - Massimo Pettorino, Maria Grazia Busà, Elisa Pellegrino:
Speech Rhythm in Parkinson's Disease: A Study on Italian. 1958-1961
Show & Tell Session 5
- Xavier Anguera, Vu Van:
English Language Speech Assistant. 1962-1963 - Allen Guo, Arlo Faria, Korbinian Riedhammer:
Remeeting - Deep Insights to Conversations. 1964-1965 - Paul Yaozhu Chan, Minghui Dong, Grace Xue Hui Ho, Haizhou Li:
SERAPHIM Live! - Singing Synthesis for the Performer, the Composer, and the 3D Game Developer. 1966-1967 - Fabrice Malfrère, Olivier Deroo, Emmanuelle Franques, Jonathan Hourez, Nicolas Mazars, Vincent Pagel, Geoffrey Wilfart:
My-Own-Voice: A Web Service That Allows You to Create a Text-to-Speech Voice From Your Own Voice. 1968-1969
Keynote 3: Anne Fernald
- Anne Fernald:
Talking with Kids Really Matters: Early Language Experience Shapes Later Life Chances. 1970
Far-Field Speech Processing
- Tara N. Sainath, Arun Narayanan, Ron J. Weiss, Ehsan Variani, Kevin W. Wilson, Michiel Bacchiani, Izhak Shafran:
Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction. 1971-1975 - Bo Li, Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Michiel Bacchiani:
Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition. 1976-1980 - Hakan Erdogan, John R. Hershey, Shinji Watanabe, Michael I. Mandel, Jonathan Le Roux:
Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks. 1981-1985 - Cristina Guerrero, Georgina Tryfou, Maurizio Omologo:
Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance. 1986-1990 - Michael I. Mandel, Jon Barker:
Multichannel Spatial Clustering for Robust Far-Field Automatic Speech Recognition in Mismatched Conditions. 1991-1995 - Vijayaditya Peddinti, Vimal Manohar, Yiming Wang, Daniel Povey, Sanjeev Khudanpur:
Far-Field ASR Without Parallel Data. 1996-2000
Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron C. Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini:
The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language. 2001-2005 - Björn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron C. Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini:
The Deception Sub-Challenge: The Data. - Sarah Ita Levitan, Guozhen An, Min Ma, Rivka Levitan, Andrew Rosenberg, Julia Hirschberg:
Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection. 2006-2010 - Shahin Amiriparian, Jouni Pohjalainen, Erik Marchi, Sergey Pugachevskiy, Björn W. Schuller:
Is Deception Emotional? An Emotion-Driven Predictive Approach. 2011-2015 - Claude Montacié, Marie-José Caraty:
Prosodic Cues and Answer Type Detection for the Deception Sub-Challenge. 2016-2020 - Björn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini:
The Sincerity Sub-Challenge: The Data. - Brandon M. Booth, Rahul Gupta, Pavlos Papadopoulos, Ruchir Travadi, Shrikanth S. Narayanan:
Automatic Estimation of Perceived Sincerity from Spoken Language. 2021-2025 - Gábor Gosztolya, Tamás Grósz, György Szaszák, László Tóth:
Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis. 2026-2030 - Hung-Shin Lee, Yu Tsao, Chi-Chun Lee, Hsin-Min Wang, Wei-Cheng Lin, Wei-Chen Chen, Shan-Wen Hsiao, Shyh-Kang Jeng:
Minimization of Regression and Ranking Losses with Shallow Neural Networks on Automatic Sincerity Evaluation. 2031-2035 - Robert Herms:
Prediction of Deception and Sincerity from Speech Using Automatic Phone Recognition-Based Features. 2036-2040 - Yue Zhang, Felix Weninger, Zhao Ren, Björn W. Schuller:
Sincerity and Deception in Speech: Two Sides of the Same Coin? A Transfer- and Multi-Task Learning Perspective. 2041-2045 - Heysem Kaya, Alexey A. Karpov:
Fusing Acoustic Feature Representations for Computational Paralinguistics Tasks. 2046-2050
Special Session: Speech, Audio, and Language Processing Techniques Applied to Bird and Animal Vocalizations
- Naomi Harte, Peter Jancovic, Karl-L. Schuchmann:
Introduction. - Naomi Harte, Peter Jancovic, Karl-L. Schuchmann:
Poster Overview Presentations. - Naomi Harte, Peter Jancovic, Karl-L. Schuchmann:
Discussion. - Naomi Harte, Peter Jancovic, Karl-L. Schuchmann:
Closing Remarks.
Dialogue Systems and Analysis of Dialogue
- Merwan Barlier, Romain Laroche, Olivier Pietquin:
A Stochastic Model for Computer-Aided Human-Human Dialogue. 2051-2055 - Gaël Lejeune, François Rioult, Bruno Crémilleux:
Highlighting Psychological Features for Predicting Child Interjections During Story Telling. 2056-2059 - Kai Sun, Su Zhu, Lu Chen, Siqiu Yao, Xueyang Wu, Kai Yu:
Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues. 2060-2064 - Gaurav Fotedar, Aditya Gaonkar P., Saikat Chatterjee, Prasanta Kumar Ghosh:
Automatic Recognition of Social Roles Using Long Term Role Transitions in Small Group Interactions. 2065-2069 - Paul Van Eecke, Raquel Fernández:
On the Influence of Gender on Interruptions in Multiparty Dialogue. 2070-2074 - Ian Beaver, Cynthia Freeman:
Detection of User Escalation in Human-Computer Interactions. 2075-2079
Interaction between Speech Production and Perception
- Marie-Lou Barnaud, Julien Diard, Pierre Bessière, Jean-Luc Schwartz:
Assessing Idiosyncrasies in a Bayesian Model of Speech Communication. 2080-2084 - Maria K. Wolters, Najoung Kim, Jung-Ho Kim, Sarah E. MacPherson, Jong-Chan Park:
Prosodic and Linguistic Analysis of Semantic Fluency Data: A Window into Speech Production and Cognition. 2085-2089 - William F. Katz, Divya Prabhakaran:
Sensorimotor Response to Visual Imagery of Tongue Displacement. 2090-2094 - Tiphaine Caudrelier, Pascal Perrier, Jean-Luc Schwartz, Amélie Rochet-Capellan:
Does Auditory-Motor Learning of Speech Transfer from the CV Syllable to the CVCV Word? 2095-2099 - Antje Schweitzer, Michael Walsh:
Exemplar Dynamics in Phonetic Convergence of Speech Rate. 2100-2104 - Outi Tuomainen, Valérie Hazan:
Articulation Rate in Adverse Listening Conditions in Younger and Older Adults. 2105-2109
Multimodal Processing
- Julia Olcoz, Oscar Saz, Thomas Hain:
Error Correction in Lightly Supervised Alignment of Broadcast Subtitles. 2110-2114 - Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain:
Automatic Genre and Show Identification of Broadcast Media. 2115-2119 - Guan-Lin Chao, William Chan, Ian R. Lane:
Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments. 2120-2124 - Amit Aides, Hagai Aronowitz:
Text-Dependent Audiovisual Synchrony Detection for Spoofing Detection in Mobile Person Recognition. 2125-2129 - Fei Tao, John H. L. Hansen, Carlos Busso:
Improving Boundary Estimation in Audiovisual Speech Activity Detection Using Bayesian Information Criterion. 2130-2134 - Sebastian Gergen, Steffen Zeiler, Ahmed Hussen Abdelaziz, Robert M. Nickel, Dorothea Kolossa:
Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR. 2135-2139
Pitch, Tone, and Music
- Anna M. Kruspe:
Retrieval of Textual Song Lyrics from Sung Inputs. 2140-2144 - Jiahong Yuan, Mark Liberman:
Phoneme, Phone Boundary, and Tone in Automatic Scoring of Mandarin Proficiency. 2145-2149 - Charles Chen, Razvan C. Bunescu, Li Xu, Chang Liu:
Tone Classification in Mandarin Chinese Using Convolutional Neural Networks. 2150-2154 - Vishala Pannala, G. Aneeja, Sudarsana Reddy Kadiri, B. Yegnanarayana:
Robust Estimation of Fundamental Frequency Using Single Frequency Filtering Approach. 2155-2159 - Ryunosuke Daido, Yuji Hisaminato:
A Fast and Accurate Fundamental Frequency Estimator Using Recursive Moving Average Filters. 2160-2164 - Prateek Verma, Ronald W. Schafer:
Frequency Estimation from Waveforms Using Multi-Layered Neural Networks. 2165-2169
Speaker Diarization and Recognition
- Douglas E. Sturim, William M. Campbell:
Speaker Linking and Applications Using Non-Parametric Hashing Methods. 2170-2174 - Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier:
Iterative PLDA Adaptation for Speaker Diarization. 2175-2179 - Harishchandra Dubey, Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen:
A Speaker Diarization System for Studying Peer-Led Team Learning Groups. 2180-2184 - Rosanna Milner, Thomas Hain:
DNN-Based Speaker Clustering for Speaker Diarisation. 2185-2189 - Itshak Lapidot, Jean-François Bonastre:
On the Importance of Efficient Transition Modeling for Speaker Diarization. 2190-2193 - Gregory Sell, Alan McCree, Daniel Garcia-Romero:
Priors for Speaker Counting and Diarization with AHC. 2194-2198 - Nauman Dawalatabad, Srikanth R. Madikeri, C. Chandra Sekhar, Hema A. Murthy:
Two-Pass IB Based Speaker Diarization System Using Meeting-Specific ANN Based Features. 2199-2203 - Zeyan Oo, Yuta Kawakami, Longbiao Wang, Seiichi Nakagawa, Xiong Xiao, Masahiro Iwahashi:
DNN-Based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification. 2204-2208 - Ulrich Scherhag, Andreas Nautsch, Christian Rathgeb, Christoph Busch:
Unit-Selection Attack Detection Based on Unfiltered Frequency-Domain Features. 2209-2213 - Mairym Lloréns Monteserín, Jason D. Zevin:
Investigating the Impact of Dialect Prestige on Lexical Decision. 2214-2218 - Jinxi Guo, Gary Yeung, Deepak Muralidharan, Harish Arsikere, Amber Afshan, Abeer Alwan:
Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features. 2219-2222 - Hang Su, Steven Wegmann:
Factor Analysis Based Speaker Verification Using ASR. 2223-2227 - Jeroen Zegers, Hugo Van hamme:
Joint Sound Source Separation and Speaker Recognition. 2228-2232 - Naveen Kumar, Md. Nasir, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Robust Multichannel Gender Classification from Speech in Movie Audio. 2233-2237
Speech Synthesis Poster
- Xavi Gonzalvo, Siamak Tazari, Chun-an Chan, Markus Becker, Alexander Gutkin, Hanna Silén:
Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer. 2238-2242 - Wenfu Wang, Shuang Xu, Bo Xu:
First Step Towards End-to-End Parametric TTS Synthesis: Generating Spectral Parameters with Neural Attention. 2243-2247 - Zhengqi Wen, Ya Li, Jianhua Tao:
The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis. 2248-2252 - Eunwoo Song, Frank K. Soong, Hong-Goo Kang:
Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech Synthesis. 2253-2257 - Yamato Ohtani, Koichiro Mori, Masahiro Morita:
Voice Quality Control Using Perceptual Expressions for Statistical Parametric Speech Synthesis Based on Cluster Adaptive Training. 2258-2262 - Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu, Simon King:
Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech Synthesis. 2263-2267 - Yi Zhao, Daisuke Saito, Nobuaki Minematsu:
Speaker Representations for Speaker Adaptation in Multiple Speakers' BLSTM-RNN-Based Speech Synthesis. 2268-2272 - Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemyslaw Szczepaniak:
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices. 2273-2277 - Nobukatsu Hojo, Yusuke Ijima, Hideyuki Mizuno:
An Investigation of DNN-Based Speech Synthesis Using Speaker Codes. 2278-2282 - Lauri Juvela, Xin Wang, Shinji Takaki, Manu Airaksinen, Junichi Yamagishi, Paavo Alku:
Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks. 2283-2287 - Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework. 2288-2292 - Blaise Potard, Matthew P. Aylett, David A. Baude, Petr Motlícek:
Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN. 2293-2297 - Alexandros Lazaridis, Milos Cernak, Philip N. Garner:
Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody. 2298-2302 - Chen-Yu Chiang:
On Smoothing and Enhancing Dynamics of Pitch Contours Represented by Discrete Orthogonal Polynomials for Prosody Generation. 2303-2307 - Anandaswarup Vadapalli, Suryakanth V. Gangashetty:
An Investigation of Recurrent Neural Network Architectures Using Word Embeddings for Phrase Break Prediction. 2308-2312 - Hao Liu, Heng Lu, Xu Shao, Yi Xu:
Model-Based Parametric Prosody Synthesis with Deep Neural Network. 2313-2317
Language Model Adaptation
- Thomas Drugman, Janne Pylkkönen, Reinhard Kneser:
Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models. 2318-2322 - Vitaly Kuznetsov, Hank Liao, Mehryar Mohri, Michael Riley, Brian Roark:
Learning N-Gram Language Models from Uncertain Data. 2323-2327 - Barlas Oguz, Issac Alphonso, Shuangyu Chang:
Entropy Based Pruning for Non-Negative Matrix Based Language Models with Contextual Features. 2328-2332 - Siva Reddy Gangireddy, Pawel Swietojanski, Peter Bell, Steve Renals:
Unsupervised Adaptation of Recurrent Neural Network Language Models. 2333-2337 - Yoni Halpern, Keith B. Hall, Vlad Schogol, Michael Riley, Brian Roark, Gleb Skobeltsyn, Martin Bäuml:
Contextual Prediction Models for Speech Recognition. 2338-2342 - Salil Deena, Madina Hasan, Mortaza Doulaty, Oscar Saz, Thomas Hain:
Combining Feature and Model-Based Adaptation of RNNLMs for Multi-Genre Broadcast Speech Recognition. 2343-2347
Show & Tell Session 6
- Michael C. Brady:
A Low Cost Desktop Robot and Tele-Presence Device for Interactive Speech Research. 2348-2349 - Simon Stone, Peter Birkholz:
Silent-Speech Command Word Recognition Using Electro-Optical Stomatography. 2350-2351 - Petr Stanislav, Jan Svec, Pavel Ircing:
An Engine for Online Video Search in Large Archives of the Holocaust Testimonies. 2352-2353 - Piero Cosi, Giulio Paci, Giacomo Sommavilla, Fabio Tesser:
MIVOQ-PTTS - A Revolutionary New Way of Thinking TTS. 3888-3889
Robustness in Speech Processing
- Katerina Zmolíková, Martin Karafiát, Karel Veselý, Marc Delcroix, Shinji Watanabe, Lukás Burget, Jan Cernocký:
Data Selection by Sequence Summarizing Neural Network in Mismatch Condition Training. 2354-2358 - Souvik Kundu, Khe Chai Sim, Mark J. F. Gales:
Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition. 2359-2363 - Konstantin Markov, Tomoko Matsui:
Robust Speech Recognition Using Generalized Distillation Framework. 2364-2368 - Yusuke Shinohara:
Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition. 2369-2372 - Víctor Poblete, Juan Pablo Escudero, Josué Fredes, José Novoa, Richard M. Stern, Simon King, Néstor Becerra Yoma:
The Use of Locally Normalized Cepstral Coefficients (LNCC) to Improve Speaker Recognition Accuracy in Highly Reverberant Rooms. 2373-2377 - William Hartmann, Tim Ng, Roger Hsiao, Stavros Tsakalidis, Richard M. Schwartz:
Two-Stage Data Augmentation for Low-Resourced Speech Recognition. 2378-2382
Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini:
The Native Language Sub-Challenge: The Data. - Avni Rajpal, Tanvina B. Patel, Hardik B. Sailor, Maulik C. Madhavi, Hemant A. Patil, Hiroya Fujisaki:
Native Language Identification Using Spectral and Source-Based Features. 2383-2387 - Yishan Jiao, Ming Tu, Visar Berisha, Julie M. Liss:
Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features. 2388-2392 - Gil Keren, Jun Deng, Jouni Pohjalainen, Björn W. Schuller:
Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native Language. 2393-2397 - Mohammed Senoussaoui, Patrick Cardinal, Najim Dehak, Alessandro L. Koerich:
Native Language Detection Using the I-Vector Framework. 2398-2402 - Mark A. Huckvale:
Within-Speaker Features for Native Language Recognition in the Interspeech 2016 Computational Paralinguistics Challenge. 2403-2407 - Prashanth Gurunath Shivakumar, Sandeep Nallan Chakravarthula, Panayiotis G. Georgiou:
Multimodal Fusion of Multirate Acoustic, Prosodic, and Lexical Speaker Characteristics for Native Language Identification. 2408-2412 - Alberto Abad, Eugénio Ribeiro, Fábio N. Kepler, Ramón Fernandez Astudillo, Isabel Trancoso:
Exploiting Phone Log-Likelihood Ratio Features for the Detection of the Native Language of Non-Native English Speakers. 2413-2417 - Gábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth:
Determining Native Language and Deception Using Phonetic Features and Classifier Combination. 2418-2422 - Björn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini:
The INTERSPEECH 2016 Computational Paralinguistics Challenge: A Summary of Results. - Björn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini:
Discussion.
Acoustic and Articulatory Phonetics
- Marija Tabain, Richard Beare:
A Preliminary Ultrasound Study of Nasal and Lateral Coronals in Arrernte. 2423-2427 - Asterios Toutios, Sajan Goud Lingala, Colin Vaz, Jangwon Kim, John H. Esling, Patricia A. Keating, Matthew Gordon, Dani Byrd, Louis Goldstein, Krishna S. Nayak, Shrikanth S. Narayanan:
Illustrating the Production of the International Phonetic Alphabet Sounds Using Fast Real-Time Magnetic Resonance Imaging. 2428-2432 - Margaret E. L. Renwick, Ioana Vasilescu, Camille Dutrey, Lori Lamel, Bianca Vieru:
Marginal Contrast Among Romanian Vowels: Evidence from ASR and Functional Load. 2433-2437 - Shuanglin Fan, Kiyoshi Honda, Jianwu Dang, Hui Feng:
Effects of Subglottal-Coupling and Interdental-Space on Formant Trajectories During Front-to-Back Vowel Transitions in Chinese. 2438-2442 - Mairym Lloréns Monteserín, Shrikanth S. Narayanan, Louis Goldstein:
Perceptual Lateralization of Coda Rhotic Production in Puerto Rican Spanish. 2443-2447 - Hao Yi, Sam Tilsen:
Interaction Between Lexical Tone and Intonation: An EMA Study. 2448-2452
Speech Synthesis Oral I: Neural Networks
- Huaiping Ming, Dong-Yan Huang, Lei Xie, Jie Wu, Minghui Dong, Haizhou Li:
Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion. 2453-2457 - Ausdang Thangthai, Ben Milner, Sarah Taylor:
Visual Speech Synthesis Using Dynamic Visemes, Contextual Features and DNNs. 2458-2462 - Srikanth Ronanki, Gustav Eje Henter, Zhizheng Wu, Simon King:
A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMs. 2463-2467 - Bo Li, Heiga Zen:
Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis. 2468-2472 - Manu Airaksinen, Bajibabu Bollepalli, Lauri Juvela, Zhizheng Wu, Simon King, Paavo Alku:
GlottDNN - A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis. 2473-2477 - Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
Singing Voice Synthesis Based on Deep Neural Networks. 2478-2482
Speech Quality & Intelligibility
- Tom Bäckström, Florin Ghido, Johannes Fischer:
Blind Recovery of Perceptual Models in Distributed Speech and Audio Coding. 2483-2487 - Yan Tang, Martin Cooke:
Glimpse-Based Metrics for Predicting Speech Intelligibility in Additive Noise Conditions. 2488-2492 - Friedemann Köster, Sebastian Möller:
Analyzing the Relation Between Overall Quality and the Quality of Individual Phases in a Telephone Conversation. 2493-2497 - Emma Jokinen, Paavo Alku:
Intelligibility Enhancement at the Receiving End of the Speech Transmission System - Effects of Far-End Noise Reduction. 2498-2502 - Mario Ganzeboom, Marjoke Bakker, Catia Cucchiarini, Helmer Strik:
Intelligibility of Disordered Speech: Global and Detailed Scores. 2503-2507 - Maria Koutsogiannaki, Yannis Stylianou:
Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility in Noise. 2508-2512
Speech Translation and Metadata for Linguistic/Discourse Structure
- Jan Niehues, Thai Son Nguyen, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Müller, Matthias Sperber, Sebastian Stüker, Alex Waibel:
Dynamic Transcription for Low-Latency Speech Translation. 2513-2517 - Oliver Adams, Graham Neubig, Trevor Cohn, Steven Bird:
Learning a Translation Model from Word Lattices. 2518-2522 - Vicky Zayats, Mari Ostendorf, Hannaneh Hajishirzi:
Disfluency Detection Using a Bidirectional LSTM. 2523-2527 - Xiaoyin Che, Sheng Luo, Haojin Yang, Christoph Meinel:
Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models. 2528-2532 - Quoc Truong Do, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
Transferring Emphasis in Speech Translation Using Hard-Attentional Neural Network Models. 2533-2537 - Ngoc-Tien Le, Christophe Servan, Benjamin Lecouteux, Laurent Besacier:
Better Evaluation of ASR in Speech Translation Context Using Word Embeddings. 2538-2542
Speech Coding and Audio Processing for Noise Reduction
- Srikanth Korse, Tobias Jähnel, Tom Bäckström:
Entropy Coding of Spectral Envelopes for Speech and Audio Coding Using Distribution Quantization. 2543-2547 - Stéphane Villette, Sen Li, Pravin Ramadas, Daniel J. Sinder:
An Objective Evaluation Methodology for Blind Bandwidth Extension. 2548-2552 - Anssi Rämö, Antti Kurittu, Henri Toukomaa:
EVS Channel Aware Mode Robustness to Frame Erasures. 2553-2557 - Shadi Pirhosseinloo, Kostas Kokkinakis:
An Interaural Magnification Algorithm for Enhancement of Naturally-Occurring Level Differences. 2558-2561 - Hendrik Kayser, Niko Moritz, Jörn Anemüller:
Probabilistic Spatial Filter Estimation for Signal Enhancement in Multi-Channel Automatic Speech Recognition. 2562-2566 - Youna Ji, Young-Cheol Park:
Improved a priori SAP Estimator in Complex Noisy Environment for Dual Channel Microphone System. 2567-2571 - Kah-Meng Cheong, Yuh-Yuan Wang, Tai-Shih Chi:
A Spectral Modulation Sensitivity Weighted Pre-Emphasis Filter for Active Noise Control System. 2572-2576 - Ganji Sreeram, Rohit Sinha:
Semi-Coupled Dictionary Based Automatic Bandwidth Extension Approach for Enhancing Children's ASR. 2577-2581
Special Session: Speech, Audio, and Language Processing Techniques Applied to Bird and Animal Vocalizations
- Jordi Bonada, Robert Lachlan, Merlijn Blaauw:
Bird Song Synthesis Based on Hidden Markov Models. 2582-2586 - Kantapon Kaewtip, Charles E. Taylor, Abeer Alwan:
Noise-Robust Hidden Markov Models for Limited Training Data for Within-Species Bird Phrase Classification. 2587-2591 - Alan Wisler, Laura J. Brattain, Rogier Landman, Thomas F. Quatieri:
A Framework for Automated Marmoset Vocalization Detection and Classification. 2592-2596 - Ikkyu Aihara, Takeshi Mizumoto, Hiromitsu Awano, Hiroshi G. Okuno:
Call Alternation Between Specific Pairs of Male Frogs Revealed by a Sound-Imaging Method in Their Natural Habitat. 2597-2601 - Patrice Guyot, Alice Eldridge, Ying Chen Eyre-Walker, Alison Johnston, Thomas Pellegrini, Mika Peck:
Sinusoidal Modelling for Ecoacoustics. 2602-2606 - Dan Stowell, Veronica Morfi, Lisa F. Gill:
Individual Identity in Songbirds: Signal Representations and Metric Learning for Locating the Information in Complex Corvid Calls. 2607-2611 - Peter Jancovic, Münevver Köküer:
Recognition of Multiple Bird Species Based on Penalised Maximum Likelihood and HMM-Based Modelling of Individual Vocalisation Elements. 2612-2616 - Ciira Wa Maina:
Cost Effective Acoustic Monitoring of Bird Species. 2617-2620 - Daniel Kohlsdorf, Denise Herzing, Thad Starner:
Feature Learning and Automatic Segmentation for Dolphin Communication Analysis. 2621-2625 - Reiji Suzuki, Shiho Matsubayashi, Kazuhiro Nakadai, Hiroshi G. Okuno:
Localizing Bird Songs Using an Open Source Robot Audition System with a Microphone Array. 2626-2630 - Frank Kurth:
Robust Detection of Multiple Bioacoustic Events with Repetitive Structures. 2631-2635 - Roger K. Moore:
A Real-Time Parametric General-Purpose Mammalian Vocal Synthesiser. 2636-2640 - Colm O'Reilly, Nicola M. Marples, David J. Kelly, Naomi Harte:
YIN-Bird: Improved Pitch Tracking for Bird Vocalisations. 2641-2645
Learning, Education and Different Speech
- Yao-Chi Hsu, Ming-Han Yang, Hsiao-Tsung Hung, Berlin Chen:
Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions. 2646-2650 - Peter A. Heeman, Rebecca Lunsford, Andy McMillin, J. Scott Yaruss:
Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech. 2651-2655 - Simin Xie, Nan Yan, Ping Yu, Manwa L. Ng, Lan Wang, Zhuanzhuan Ji:
Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale. 2656-2660 - Lauren Ward, Alessandro Stefani, Daniel V. Smith, Andreas Duenser, Jill Freyne, Barbara Dodd, Angela Morgan:
Automated Screening of Speech Development Issues in Children by Identifying Phonological Error Patterns. 2661-2665 - Ju Lin, Yanlu Xie, Jinsong Zhang:
Automatic Pronunciation Evaluation of Non-Native Mandarin Tone by Using Multi-Level Confidence Measures. 2666-2670 - Myung Jong Kim, Jun Wang, Hoirin Kim:
Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model. 2671-2675 - Anne S. Warlaumont, Heather L. Ramsdell-Hudock:
Detection of Total Syllables and Canonical Syllables in Infant Vocalizations. 2676-2680 - Duc Le, Emily Mower Provost:
Improving Automatic Recognition of Aphasic Speech with AphasiaBank. 2681-2685 - Vincent Laborde, Thomas Pellegrini, Lionel Fontan, Julie Mauclair, Halima Sahraoui, Jérôme Farinas:
Pronunciation Assessment of Japanese Learners of French with GOP Scores and Phonetic Information. 2686-2690 - Sean Robertson, Cosmin Munteanu, Gerald Penn:
Pronunciation Error Detection for New Language Learners. 2691-2695 - Hongwei Ding, Xinping Xu:
L2 English Rhythm in Read Speech by Chinese Students. 2696-2700
Dialogue Systems and Analysis of Dialogue
- Miao Li, Zhipeng Chen, Ji Wu:
Improving the Probabilistic Framework for Representing Dialogue Systems with User Response Model. 2701-2705 - Yiping Song, Lili Mou, Rui Yan, Li Yi, Zinan Zhu, Xiaohua Hu, Ming Zhang:
Dialogue Session Segmentation by Embedding-Enhanced TextTiling. 2706-2710 - Miao Li, Zhiyang He, Ji Wu:
Target-Based State and Tracking Algorithm for Spoken Dialogue System. 2711-2715 - Sheng-syun Shen, Hung-yi Lee:
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection. 2716-2720 - Manoj Kumar, Rahul Gupta, Daniel Bone, Nikolaos Malandrakis, Somer Bishop, Shrikanth S. Narayanan:
Objective Language Feature Analysis in Children with Neurodevelopmental Disorders During Autism Assessment. 2721-2725 - Iñigo Casanueva, Thomas Hain, Phil D. Green:
Improving Generalisation to New Speakers in Spoken Dialogue State Tracking. 2726-2730 - Bo-Hsiang Tseng, Sheng-syun Shen, Hung-yi Lee, Lin-Shan Lee:
Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine. 2731-2735
Topics in Speech Recognition
- Suman V. Ravuri, Steven Wegmann:
How Neural Network Depth Compensates for HMM Conditional Independence Assumptions in DNN-HMM Acoustic Models. 2736-2740 - Dimitri Palaz, Gabriel Synnaeve, Ronan Collobert:
Jointly Learning to Locate and Classify Words Using Convolutional Networks. 2741-2745 - Raziel Alvarez, Rohit Prabhavalkar, Anton Bakhtin:
On the Efficient Representation and Execution of Deep Acoustic Models. 2746-2750 - Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahremani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur:
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI. 2751-2755 - Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf:
Virtual Adversarial Training Applied to Neural Higher-Order Factors for Phone Classification. 2756-2760 - Jeremy Heng Meng Wong, Mark J. F. Gales:
Sequence Student-Teacher Training of Deep Neural Networks. 2761-2765
Special Session: Realism in Robust Speech Processing
- John H. L. Hansen, Hynek Boril:
Robustness in Speech, Speaker, and Language Recognition: "You've Got to Know Your Limitations". 2766-2770 - Emma Jokinen, Ulpu Remes, Paavo Alku:
The Use of Read versus Conversational Lombard Speech in Spectral Tilt Modeling for Intelligibility Enhancement in Near-End Noise Conditions. 2771-2775 - Douglas E. Sturim, Pedro A. Torres-Carrasquillo, Joseph P. Campbell:
Corpora for the Evaluation of Robust Speaker Recognition Systems. 2776-2780 - Nancy Bertin, Ewen Camberlein, Emmanuel Vincent, Romain Lebarbenchon, Stéphane Peillon, Éric Lamande, Sunit Sivasankaran, Frédéric Bimbot, Irina Illina, Ariane Tom, Sylvain Fleury, Éric Jamet:
A French Corpus for Distant-Microphone Speech Processing in Real Homes. 2781-2785 - Mirco Ravanelli, Piergiorgio Svaizer, Maurizio Omologo:
Realistic Multi-Microphone Data Simulation for Distant Speech Recognition. 2786-2790 - Hannes Gamper, Mark R. P. Thomas, Lyle Corbin, Ivan Tashev:
Synthesis of Device-Independent Noise Corpora for Realistic ASR Evaluation. 2791-2795 - Fred Richardson, Michael S. Brandstein, Jennifer Melot, Douglas A. Reynolds:
Speaker Recognition Using Real vs Synthetic Parallel Data for DNN Channel Compensation. 2796-2800 - Dayana Ribas, Emmanuel Vincent, John H. L. Hansen, Emma Jokinen, Mirco Ravanelli, Hannes Gamper, Fred Richardson:
Discussion.
Spoken Word Recognition
- Louis ten Bosch, Lou Boves, Mirjam Ernestus:
Combining Data-Oriented and Process-Oriented Approaches to Modeling Reaction Time Data. 2801-2805 - Michael McAuliffe, Molly Babel, Charlotte Vaughn:
Do Listeners Learn Better from Natural Speech? 2806-2810 - Polina Drozdova, Roeland van Hout, Odette Scharenborg:
Processing and Adaptation to Ambiguous Sounds during the Course of Perceptual Learning. 2811-2815 - Florian Hintz, Odette Scharenborg:
The Effect of Background Noise on the Activation of Phonological and Semantic Information During Spoken-Word Recognition. 2816-2820 - Shinae Kang, Clara Cohen:
Relationships Between Functional Load and Auditory Confusability Under Different Speech Environments. 2821-2825 - Jasmeen Kanwal, Amanda Ritchart:
The Role of Pitch in Punjabi Word Identification. 2826-2830
Speech Synthesis Oral: High Level Linguistic Features
- Marie Tahon, Raheel Qader, Gwénolé Lecorvé, Damien Lolive:
Improving TTS with Corpus-Specific Pronunciation Adaptation. 2831-2835 - Amr El-Desoky Mousa, Björn W. Schuller:
Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks for Grapheme-to-Phoneme Conversion Utilizing Complex Many-to-Many Alignments. 2836-2840 - Daan van Esch, Mason Chua, Kanishka Rao:
Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks. 2841-2845 - Maël Pouget, Olha Nahorna, Thomas Hueber, Gérard Bailly:
Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis. 2846-2850 - Rasmus Dall, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
Redefining the Linguistic Context Feature Set for HMM and DNN TTS Through Position and Parsing. 2851-2855 - Xin Wang, Shinji Takaki, Junichi Yamagishi:
Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System. 2856-2860
Speech Enhancement
- Kwang Myung Jeon, Hong Kook Kim:
Local Sparsity Based Online Dictionary Learning for Environment-Adaptive Speech Enhancement with Nonnegative Matrix Factorization. 2861-2865 - Pavlos Papadopoulos, Colin Vaz, Shrikanth S. Narayanan:
Noise Aware and Combined Noise Models for Speech Denoising in Unknown Noise Conditions. 2866-2869 - Seyedmahdad Mirsamadi, Ivan Tashev:
Causal Speech Enhancement Combining Data-Driven Learning and Suppression Rule Estimation. 2870-2874 - Alessio Brutti, Antigoni Tsiami, Athanasios Katsamanis, Petros Maragos:
A Phase-Based Time-Frequency Masking for Multi-Channel Speech Enhancement in Domestic Environments. 2875-2879 - Petko Nikolov Petkov, Yannis Stylianou:
Generalizing Steady State Suppression for Enhanced Intelligibility Under Reverberation. 2880-2884 - Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani:
Speech Intelligibility Prediction Based on the Envelope Power Spectrum Model with the Dynamic Compressive Gammachirp Auditory Filterbank. 2885-2889
Dialogue: Backchannels and Turntaking
- Tatsuya Kawahara, Takashi Yamaguchi, Koji Inoue, Katsuya Takanashi, Nigel G. Ward:
Prediction and Generation of Backchannel Form for Attentive Listening Systems. 2890-2894 - Rebecca Lunsford, Peter A. Heeman, Emma Rennie:
Measuring Turn-Taking Offsets in Human-Human Dialogues. 2895-2899 - Tomer Meshorer, Peter A. Heeman:
Using Past Speaker Behavior to Better Predict Turn Transitions. 2900-2904 - Gérard Bailly, Frédéric Elisei, Alexandra Juphard, Olivier Moreaud:
Quantitative Analysis of Backchannels Uttered by an Interviewer During Neuropsychological Tests. 2905-2909 - Shammur Absar Chowdhury, Evgeny A. Stepanov, Giuseppe Riccardi:
Predicting User Satisfaction from Turn-Taking in Spoken Conversations. 2910-2914 - Catharine Oertel, Joakim Gustafson, Alan W. Black:
Towards Building an Attentive Artificial Listener: On the Perception of Attentiveness in Feedback Utterances. 2915-2919
Language Recognition
- Youngjune L. Gwon, William M. Campbell, Douglas E. Sturim, H. T. Kung:
Language Recognition via Sparse Coding. 2920-2924 - Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah:
A Feature Normalisation Technique for PLLR Based Language Identification Systems. 2925-2929 - Mounika K. V., Sivanand Achanta, Lakshmi H. R., Suryakanth V. Gangashetty, Anil Kumar Vuppala:
An Investigation of Deep Neural Network Architectures for Language Recognition in Indian Languages. 2930-2933 - Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James R. Glass, Peter Bell, Steve Renals:
Automatic Dialect Detection in Arabic Broadcast Speech. 2934-2938 - Raymond W. M. Ng, Bhusan Chettri, Thomas Hain:
Combining Weak Tokenisers for Phonotactic Language Recognition in a Resource-Constrained Setting. 2939-2943 - Wang Geng, Wenfu Wang, Yuanyuan Zhao, Xinyuan Cai, Bo Xu:
End-to-End Language Identification Using Attention-Based Recurrent Neural Networks. 2944-2948 - Hesam Sagha, Pavel Matejka, Maryna Gavryukova, Filip Povolný, Erik Marchi, Björn W. Schuller:
Enhancing Multilingual Recognition of Emotion in Speech by Language Identification. 2949-2953
Speech and Audio Segmentation and Classification
- Seongkyu Mun, Suwon Shon, Wooil Kim, Hanseok Ko:
Deep Neural Network Bottleneck Features for Acoustic Event Recognition. 2954-2957 - Antonio Origlia, Francesco Cutugno:
Combining Energy and Cross-Entropy Analysis for Nuclear Segments Detection. 2958-2962 - Roland Maas, Sree Hari Krishnan Parthasarathi, Brian John King, Ruitong Huang, Björn Hoffmeister:
Anchored Speech Detection. 2963-2967 - Mahesh Kumar Nandwana, Taufiq Hasan:
Towards Smart-Cars That Can Listen: Abnormal Acoustic Event Detection on the Road. 2968-2971 - K. V. Vijay Girish, A. G. Ramakrishnan, T. V. Ananthapadmanabha:
Hierarchical Classification of Speaker and Background Noise and Estimation of SNR Using Sparse Representation. 2972-2976 - Haomin Zhang, Ian McLoughlin, Yan Song:
Robust Sound Event Detection in Continuous Audio Environments. 2977-2981 - Naoya Takahashi, Michael Gygli, Beat Pfister, Luc Van Gool:
Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition. 2982-2986 - Stefan Meier, Walter Kellermann:
Artificial Neural Network-Based Feature Combination for Spatial Voice Activity Detection. 2987-2991 - Tomi Kinnunen, Alexey Sholokhov, Elie Khoury, Dennis Alexander Lehmann Thomsen, Md. Sahidullah, Zheng-Hua Tan:
HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors. 2992-2996 - Florian B. Pokorny, Robert Peharz, Wolfgang Roth, Matthias Zöhrer, Franz Pernkopf, Peter B. Marschik, Björn W. Schuller:
Manual versus Automated: The Challenging Routine of Infant Vocalisation Segmentation in Home Videos to Study Neuro(mal)development. 2997-3001 - Luciana Ferrer, Martin Graciarena:
Minimizing Annotation Effort for Adaptation of Speech-Activity Detection Systems. 3002-3006
New Products and Services
- Roger K. Moore, Hui Li, Shih-Hao Liao:
Progress and Prospects for Spoken Language Technology: What Ordinary People Think. 3007-3011 - Roger K. Moore, Ricard Marxer:
Progress and Prospects for Spoken Language Technology: Results from Four Sexennial Surveys. 3012-3016 - Purushotam G. Radadia, Rahul Kumar, Kanika Kalra, Shirish Karande, Sachin Lodha:
On Employing a Highly Mismatched Crowd for Speech Transcription. 3017-3021 - Roger Hsiao, Ralf Meermeier, Tim Ng, Zhongqiang Huang, Maxwell Jordan, Enoch Kan, Tanel Alumäe, Jan Silovský, William Hartmann, Francis Keith, Omer Lang, Man-Hung Siu, Owen Kimball:
Sage: The New BBN Speech Processing Platform. 3022-3026 - Kang Hyun Lee, Tae Gyoon Kang, Woo Hyun Kang, Nam Soo Kim:
DNN-Based Feature Enhancement Using Joint Training Framework for Robust Multichannel Speech Recognition. 3027-3031 - Michael Wand, Jürgen Schmidhuber:
Deep Neural Network Frontend for Continuous EMG-Based Speech Recognition. 3032-3036 - Basil Abraham, Srinivasan Umesh, Neethu Mariam Joy:
Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages. 3037-3041 - Anton Ragni, Edgar Dakin, Xie Chen, Mark J. F. Gales, Kate M. Knill:
Multi-Language Neural Network Language Models. 3042-3046 - Ottokar Tilk, Tanel Alumäe:
Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration. 3047-3051 - Seppo Enarvi, Mikko Kurimo:
TheanoLM - An Extensible Toolkit for Neural Network Language Modeling. 3052-3056 - Pierre Lanchantin, Mark J. F. Gales, Penny Karanasou, Xunying Liu, Yanman Qian, Linlin Wang, Philip C. Woodland, Chao Zhang:
Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems. 3057-3061 - Yashesh Gaur, Florian Metze, Jeffrey P. Bigham:
Manipulating Word Lattices to Incorporate Human Corrections. 3062-3065 - Philipp Fischer, Cornelius Styp von Rekowski, Andreas Nürnberger:
Context-Aware Restaurant Recommendation for Natural Language Queries: A Formative User Study in the Automotive Domain. 3066-3070 - Stephanie Pancoast, Murat Akbacak:
Teaming Up: Making the Most of Diverse Representations for a Novel Personalized Speech Retrieval Application. 3071-3075 - Vikramjit Mitra, Andreas Kathol, Jonathan D. Amith, Rey Castillo García:
Automatic Speech Transcription for Low-Resource Languages - The Case of Yoloxóchitl Mixtec (Mexico). 3076-3080 - Reza Asadi, Harriet J. Fell, Timothy W. Bickmore, Ha Trinh:
Real-Time Presentation Tracking Using Semantic Keyword Spotting. 3081-3085
Low Resource Speech Recognition
- Andrew Wilkinson, Tiancheng Zhao, Alan W. Black:
Deriving Phonetic Transcriptions and Discovering Word Segmentations for Speech-to-Speech Translation in Low-Resource Settings. 3086-3090 - Satoshi Tsujioka, Sakriani Sakti, Koichiro Yoshino, Graham Neubig, Satoshi Nakamura:
Unsupervised Joint Estimation of Grapheme-to-Phoneme Conversion Systems and Acoustic Model Adaptation for Non-Native Speech Recognition. 3091-3095 - Antoine Bruguier, Fuchun Peng, Françoise Beaufays:
Learning Personalized Pronunciations for Contact Name Recognition. 3096-3100 - Zhenhao Ge, Aravind Ganapathiraju, Ananth N. Iyer, Scott A. Randal, Felix I. Wyss:
Generation and Pruning of Pronunciation Variants to Improve ASR Accuracy. 3101-3105 - Janne Pylkkönen, Thomas Drugman, Max Bisani:
Optimizing Speech Recognition Evaluation Using Stratified Sampling. 3106-3110
Keynote 4: Dan Jurafsky
- Dan Jurafsky:
Ketchup, Interdisciplinarity, and the Spread of Innovation in Speech and Language Processing. 3111
Special Event: Speech Ventures
- Nicolas Scheffer, Korbinian Riedhammer, Alexandre Lebrun, David Suendermann-Oeft:
Speech Ventures.
Special Session: Speech and Language Technologies for Human-Machine Conversation-Based Language Education
- Rong Tong, Nancy F. Chen, Bin Ma, Haizhou Li:
Context Aware Mispronunciation Detection for Mandarin Pronunciation Training. 3112-3116 - Jidong Tao, Lei Chen, Chong Min Lee:
DNN Online with iVectors Acoustic Modeling and Doc2Vec Distributed Representations for Improving Automated Speech Scoring. 3117-3121 - Yao Qian, Xinhao Wang, Keelan Evanini, David Suendermann-Oeft:
Self-Adaptive DNN for Improving Spoken Language Proficiency Assessment. 3122-3126 - Wei Li, Kehuang Li, Sabato Marco Siniscalchi, Nancy F. Chen, Chin-Hui Lee:
Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees. 3127-3131 - Xiaoyun Wang, Tsuneo Kato, Seiichi Yamamoto:
Phoneme Set Design Considering Integrated Acoustic and Linguistic Features of Second Language Speech. 3132-3136 - Ramya Rasipuram, Milos Cernak, Mathew Magimai-Doss:
HMM-Based Non-Native Accent Assessment Using Posterior Features. 3137-3141 - Shuju Shi, Yosuke Kashiwagi, Shohei Toyama, Junwei Yue, Yutaka Yamauchi, Daisuke Saito, Nobuaki Minematsu:
Automatic Assessment and Error Detection of Shadowing Speech: Case of English Spoken by Japanese Learners. 3142-3146
Phonation and Voice Quality
- Mísa Hejná:
Multiplicity of the Acoustic Correlates of the Fortis-Lenis Contrast: Plosives in Aberystwyth English. 3147-3151 - Yossi Adi, Joseph Keshet, Olga Dmitrieva, Matthew Goldrick:
Automatic Measurement of Voice Onset Time and Prevoicing Using Recurrent Neural Networks. 3152-3155 - Sucheta Ghosh, Camille Fauth, Aghilas Sini, Yves Laprie:
L1-L2 Interference: The Case of Final Devoicing of French Voiced Fricatives in Final Position by German Learners. 3156-3160 - Irena Yanushevskaya, Andy Murphy, Christer Gobl, Ailbhe Ní Chasaide:
Perceptual Salience of Voice Source Parameters in Signaling Focal Prominence. 3161-3165 - Michal Borsky, Daryush D. Mehta, Julius P. Gudjohnsen, Jón Guðnason:
Classification of Voice Modality Using Electroglottogram Waveforms. 3166-3170 - Kikuo Maekawa, Hiroki Mori:
Voice-Quality Difference Between the Vowels in Filled Pauses and Ordinary Lexical Items. 3171-3175
Speech Synthesis Oral: Prosody and Expressive Speech
- Yan-You Chen, Chung-Hsien Wu, Yu-Fong Huang:
Generation of Emotion Control Vector Using MDS-Based Space Transformation for Expressive Speech Synthesis. 3176-3180 - Igor Jauk, Antonio Bonafonte:
Direct Expressive Voice Training Based on Semantic Selection. 3181-3185 - Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi:
Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech Synthesis. 3186-3190 - Norbert Braunschweiler, Ranniery Maia:
Pause Prediction from Text for Speech Synthesis with User-Definable Pause Insertion Likelihood Threshold. 3191-3195 - Quoc Truong Do, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A Hybrid System for Continuous Word-Level Emphasis Modeling Based on HMM State Clustering and Adaptive Training. 3196-3200 - Yibin Zheng, Ya Li, Zhengqi Wen, Xingguang Ding, Jianhua Tao:
Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach. 3201-3205
Language Recognition
- Hui Zhao, Désiré Bansé, George R. Doddington, Craig S. Greenberg, Jaime Hernandez-Cordero, John M. Howard, Lisa P. Mason, Alvin F. Martin, Douglas A. Reynolds, Elliot Singer, Audrey Tong:
Results of The 2015 NIST Language Recognition Evaluation. 3206-3210 - Kong-Aik Lee, Haizhou Li, Li Deng, Ville Hautamäki, Wei Rao, Xiong Xiao, Anthony Larcher, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Jianshu Chen, Ivan Kukanov, Amir Hossein Poorjam, Trung Ngo Trong, Chenglin Xu, Haihua Xu, Bin Ma, Eng Siong Chng, Sylvain Meignier:
The 2015 NIST Language Recognition Evaluation: The Shared View of I2R, Fantastic4 and SingaMS. 3211-3215 - Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai:
Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language Identification. 3216-3220 - Ruchir Travadi, Shrikanth S. Narayanan:
Non-Iterative Parameter Estimation for Total Variability Model Using Randomized Singular Value Decomposition. 3221-3225 - Daniel Garcia-Romero, Alan McCree:
Stacked Long-Term TDNN for Spoken Language Recognition. 3226-3230 - Gregory Gelly, Jean-Luc Gauvain, Viet Bac Le, Abdelkhalek Messaoudi:
A Divide-and-Conquer Approach for Language Identification Based on Recurrent Neural Networks. 3231-3235
Spoken Language Understanding Systems
- Chiori Hori, Takaaki Hori, Shinji Watanabe, John R. Hershey:
Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs. 3236-3240 - Vedran Vukotic, Christian Raymond, Guillaume Gravier:
A Step Beyond Local Observations with a Dialog Aware Bidirectional GRU Network for Spoken Language Understanding. 3241-3244 - Yun-Nung Chen, Dilek Hakkani-Tür, Gökhan Tür, Jianfeng Gao, Li Deng:
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding. 3245-3249 - Ngoc Thang Vu:
Sequential Convolutional Neural Networks for Slot Filling in Spoken Language Understanding. 3250-3254 - Asli Celikyilmaz, Ruhi Sarikaya, Dilek Hakkani-Tür, Xiaohu Liu, Nikhil Ramesh, Gökhan Tür:
A New Pre-Training Method for Training Deep Learning Models with Application to Spoken Language Understanding. 3255-3259 - Jérémie Tafforeau, Frédéric Béchet, Thierry Artières, Benoît Favre:
Joint Syntactic and Semantic Analysis with a Multitask Deep Learning Framework for Spoken Language Understanding. 3260-3264
Language Recognition
- Ruizhi Li, Sri Harish Reddy Mallidi, Lukás Burget, Oldrich Plchot, Najim Dehak:
Exploiting Hidden-Layer Responses of Deep Neural Networks for Language Recognition. 3265-3269 - Saad Irtza, Vidhyasaharan Sethu, Sarith Fernando, Eliathamby Ambikairajah, Haizhou Li:
Out of Set Language Modelling in Hierarchical Language Identification. 3270-3274 - Ryo Masumura, Taichi Asami, Hirokazu Masataki, Yushi Aono, Sumitaka Sakauchi:
Language Identification Based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs. 3275-3279 - Wang Geng, Yuanyuan Zhao, Wenfu Wang, Xinyuan Cai, Bo Xu:
Gating Recurrent Enhanced Memory Neural Networks on Language Identification. 3280-3284 - Jan Pesán, Lukás Burget, Jan Cernocký:
Sequence Summarizing Neural Networks for Spoken Language Recognition. 3285-3288 - Michelle R. Kapolowicz, Vahid Montazeri, Peter F. Assmann:
The Role of Spectral Resolution in Foreign-Accented Speech Perception. 3289-3293 - Liang He, Yao Tian, Yi Liu, Jiaming Xu, Weiwei Liu, Cai Meng, Jia Liu:
THU-EE System Description for NIST LRE 2015. 3294-3298 - Kristiina Jokinen, Trung Ngo Trong, Ville Hautamäki:
Variation in Spoken North Sami Language. 3299-3303
Music, Audio, and Source Separation
- Weibin Zhang, Wenkang Lei, Xiangmin Xu, Xiaofeng Xing:
Improved Music Genre Classification with Convolutional Neural Networks. 3304-3308 - Gurunath Reddy M., K. Sreenivasa Rao:
Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music Signals. 3309-3313 - Jitong Chen, DeLiang Wang:
Long Short-Term Memory for Speaker Generalization in Supervised Speech Separation. 3314-3318 - Anna M. Kruspe:
Phonotactic Language Identification for Singing. 3319-3323 - Thomas Bentsen, Tobias May, Abigail A. Kressner, Torsten Dau:
Comparing the Influence of Spectro-Temporal Integration in Computational Speech Segregation. 3324-3328 - Sean U. N. Wood, Jean Rouat:
Blind Speech Separation with GCC-NMF. 3329-3333 - Vahid Montazeri, Shaikat Hossain, Peter F. Assmann:
Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking. 3334-3338 - Emad M. Grais, Gerard Roma, Andrew J. R. Simpson, Mark D. Plumbley:
Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks. 3339-3343 - Cosimo Riday, Saurabh Bhargava, Richard H. R. Hahnloser, Shih-Chii Liu:
Monaural Source Separation Using a Random Forest Classifier. 3344-3348 - Xu Li, Ziteng Wang, Xiaofei Wang, Qiang Fu, Yonghong Yan:
Adaptive Group Sparsity for Non-Negative Matrix Factorization with Application to Unsupervised Source Separation. 3349-3353 - Yanmeng Guo, Xiaofei Wang, Chao Wu, Qiang Fu, Ning Ma, Guy J. Brown:
A Robust Dual-Microphone Speech Source Localization Algorithm for Reverberant Environments. 3354-3358 - Ning Ma, Guy J. Brown:
Speech Localisation in a Multitalker Mixture by Humans and Machines. 3359-3363 - Sundar Harshavardhan, Gokul Deepak Manavalan, T. V. Sreenivas, Chandra Sekhar Seelamantula:
Reverberation-Robust One-Bit TDOA Based Moving Source Localization for Automatic Camera Steering. 3364-3368 - Keiko Ochi, Nobutaka Ono, Shigeki Miyabe, Shoji Makino:
Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage. 3369-3373
Acoustic Modeling with Neural Networks
- Johannes Fahringer, Tobias Schrank, Johannes Stahl, Pejman Mowlaee, Franz Pernkopf:
Phase-Aware Signal Processing for Automatic Speech Recognition. 3374-3378 - Hardik B. Sailor, Hemant A. Patil:
Unsupervised Deep Auditory Model Using Stack of Convolutional RBMs for Speech Recognition. 3379-3383 - Philip Weber, Linxue Bai, Martin J. Russell, Peter Jancovic, Stephen M. Houghton:
Interpretation of Low Dimensional Neural Network Bottleneck Features in Terms of Human Perception and Production. 3384-3388 - Shiliang Zhang, Hui Jiang, Shifu Xiong, Si Wei, Li-Rong Dai:
Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition. 3389-3393 - Jian Tang, Shiliang Zhang, Si Wei, Li-Rong Dai:
Future Context Attention for Unidirectional LSTM Based Acoustic Model. 3394-3398 - Jen-Tzung Chien, Pei-Wen Huang, Tan Lee:
Hybrid Accelerated Optimization for Speech Recognition. 3399-3403 - William Chan, Ian R. Lane:
On Online Attention-Based Speech Recognition and Joint Mandarin Character-Pinyin Training. 3404-3408 - Gábor Gosztolya, Tamás Grósz, László Tóth:
GMM-Free Flat Start Sequence-Discriminative DNN Training. 3409-3413 - Yajie Miao, Florian Metze:
Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach. 3414-3418 - Yuanyuan Zhao, Shuang Xu, Bo Xu:
Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling. 3419-3423 - Albert Zeyer, Ralf Schlüter, Hermann Ney:
Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models. 3424-3428 - Tom Sercu, Vaibhava Goel:
Advances in Very Deep Convolutional Neural Networks for LVCSR. 3429-3433 - Pegah Ghahremani, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur:
Acoustic Modelling from the Signal Domain Using CNNs. 3434-3438 - Yevgen Chebotar, Austin Waters:
Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition. 3439-3443 - Weiran Wang, Hao Tang, Karen Livescu:
Triphone State-Tying via Deep Canonical Correlation Analysis. 3444-3448 - Gil Luyet, Pranay Dighe, Afsaneh Asaei, Hervé Bourlard:
Low-Rank Representation of Nearest Neighbor Posterior Probabilities to Enhance DNN Based Acoustic Modeling. 3449-3453
Robustness and Adaptation
- Hao Zheng, Shanshan Zhang, Liwei Qiao, Jianping Li, Wenju Liu:
Improving Large Vocabulary Accented Mandarin Speech Recognition with Attribute-Based I-Vectors. 3454-3458 - Syed Shahnawazuddin, Abhishek Dey, Rohit Sinha:
Pitch-Adaptive Front-End Features for Robust Children's ASR. 3459-3463 - Miguel Ángel del Agua, Santiago Piqueras, Adrià Giménez, Alberto Sanchís, Jorge Civera, Alfons Juan:
ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks. 3464-3468 - Luis Fernando D'Haro, Rafael E. Banchs:
Automatic Correction of ASR Outputs by Using Machine Translation. 3469-3473 - Sri Harish Reddy Mallidi, Hynek Hermansky:
A Framework for Practical Multistream ASR. 3474-3478 - Neethu Mariam Joy, Murali Karthick Baskar, Srinivasan Umesh, Basil Abraham:
DNNs for Unsupervised Extraction of Pseudo FMLLR Features Without Explicit Adaptation Data. 3479-3483 - Lahiru Samarakoon, Khe Chai Sim:
Multi-Attribute Factorized Hidden Layer Adaptation for DNN Acoustic Models. 3484-3488 - Jahyun Goo, Younggwan Kim, Hyungjun Lim, Hoirin Kim:
Speaker Normalization Through Feature Shifting of Linearly Transformed i-Vector. 3489-3493
Special Event: Computational Approaches to Linguistic Code Switching
- Mona T. Diab, Pascale Fung, Julia Hirschberg, Thamar Solorio:
Computational Approaches to Linguistic Code Switching.
Neural Networks for Language Modeling
- Ebru Arisoy, Murat Saraclar:
Compositional Neural Network Language Models for Agglutinative Languages. 3494-3498 - Babak Damavandi, Shankar Kumar, Noam Shazeer, Antoine Bruguier:
NN-Grams: Unifying Neural Network and n-Gram Language Models for Speech Recognition. 3499-3503 - Md. Akmal Haidar, Mikko Kurimo:
Recurrent Neural Network Language Model with Incremental Updated Context Information Generated Using Bag-of-Words Representation. 3504-3508 - Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow:
Sequential Recurrent Neural Networks for Language Modeling. 3509-3513 - Michael Levit, Sarangarajan Parthasarathy, Shuangyu Chang:
Word-Phrase-Entity Recurrent Neural Networks for Language Modeling. 3514-3518 - Kazuki Irie, Zoltán Tüske, Tamer Alkhouli, Ralf Schlüter, Hermann Ney:
LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition. 3519-3523
Special Session: Sub-Saharan African Languages: From Speech Fundamentals to Applications
- Amit Das, Preethi Jyothi, Mark Hasegawa-Johnson:
Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and Dinka. 3524-3528 - Elodie Gauthier, Laurent Besacier, Sylvie Voisin:
Speed Perturbation and Vowel Duration Modeling for ASR in Hausa and Wolof Languages. 3529-3533 - Charl Johannes van Heerden, Neil Kleynhans, Marelie H. Davel:
Improving the Lwazi ASR Baseline. 3534-3538 - Pierre Godard, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen, Laurent Besacier, Hélène Bonneau-Maynard, Guy-Noël Kouarata, Kevin Löser, Annie Rialland, François Yvon:
Preliminary Experiments on Unsupervised Word Discovery in Mboshi. 3539-3543 - Marco Vetter, Markus Müller, Fatima Hamlaoui, Graham Neubig, Satoshi Nakamura, Sebastian Stüker, Alex Waibel:
Unsupervised Phoneme Segmentation of Previously Unseen Languages. 3544-3548 - Céline Manenti, Thomas Pellegrini, Julien Pinquier:
CNN-Based Phone Segmentation Experiments in a Less-Represented Language. 3549-3553 - Georg I. Schlünz, Nkosikhona Dlamini, Rynhardt P. Kruger:
Part-of-Speech Tagging and Chunking in Text-to-Speech Synthesis for South African Languages. 3554-3558 - Ewald van der Westhuizen, Thomas Niesler:
The Effect of Postlexical Deletion on Automatic Speech Recognition in Fast Spontaneously Spoken Zulu. 3559-3563
Speech Production Models
- Vikram Ramanarayanan, Benjamin Parrell, Louis Goldstein, Srikantan S. Nagarajan, John F. Houde:
A New Model of Speech Motor Control Based on Task Dynamics and State Feedback. 3564-3568 - Saeed Dabbaghchian, Marc Arnela, Olov Engwall, Oriol Guasch, Ian Stavness, Pierre Badin:
Using a Biomechanical Model and Articulatory Data for the Numerical Production of Vowels. 3569-3573 - Jianguo Wei, Wendan Guan, Darcy Q. Hou, Dingyi Pan, Wenhuan Lu, Jianwu Dang:
A New Model for Acoustic Wave Propagation and Scattering in the Vocal Tract. 3574-3578 - Andrew Szabados, Pascal Perrier:
Uncontrolled Manifolds in Vowel Production: Assessment with a Biomechanical Model of the Tongue. 3579-3583 - Tsukasa Yoshinaga, Kazunori Nozaki, Shigeo Wada:
Experimental Validation of Sound Generated from Flow in Simplified Vocal Tract Model of Sibilant /s/. 3584-3587 - Jean-François Patri, Pascal Perrier, Julien Diard:
Bayesian Modeling in Speech Motor Control: A Principled Structure for the Integration of Various Constraints. 3588-3592
Speaker States and Traits
- Zixing Zhang, Fabien Ringeval, Jing Han, Jun Deng, Erik Marchi, Björn W. Schuller:
Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks. 3593-3597 - Srinivas Parthasarathy, Carlos Busso:
Defining Emotionally Salient Regions Using Qualitative Agreement Method. 3598-3602 - Sayan Ghosh, Eugene Laksana, Louis-Philippe Morency, Stefan Scherer:
Representation Learning for Speech Emotion Recognition. 3603-3607 - Xingfeng Li, Masato Akagi:
Multilingual Speech Emotion Recognition System Based on a Three-Layer Model. 3608-3612 - Ozlem Kalinli:
Analysis of Multi-Lingual Emotion Recognition Using Auditory Attention Features. 3613-3617 - Haytham M. Fayek, Margaret Lech, Lawrence Cavedon:
On the Correlation and Transferability of Features Between Automatic Speech Recognition and Speech Emotion Recognition. 3618-3622
Speaker Recognition
- Giacomo Valenti, Adrien Daniel, Nicholas W. D. Evans:
On the Influence of Text Content on Pass-Phrase Strength for Short-Duration Text-Dependent Automatic Speaker Authentication. 3623-3627 - Massimiliano Todisco, Héctor Delgado, Nicholas W. D. Evans:
Articulation Rate Filtering of CQCC Features for Automatic Speaker Verification. 3628-3632 - Seyed Omid Sadjadi, Jason W. Pelecanos, Sriram Ganapathy:
The IBM Speaker Recognition System: Recent Advances and Error Analysis. 3633-3637 - Waad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre:
Probabilistic Approach Using Joint Clean and Noisy i-Vectors Modeling for Speaker Recognition. 3638-3642 - Fahimeh Bahmaninezhad, John H. L. Hansen:
Generalized Discriminant Analysis (GDA) for Improved i-Vector Based Speaker Recognition. 3643-3647 - Yao Qian, Jidong Tao, David Suendermann-Oeft, Keelan Evanini, Alexei V. Ivanov, Vikram Ramanarayanan:
Noise and Metadata Sensitive Bottleneck Features for Improving Speaker Recognition with Non-Native Speech Input. 3648-3652
VAD and Audio Events
- Huy Phan, Lars Hertel, Marco Maaß, Alfred Mertins:
Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks. 3653-3657 - Giannis Karamanolakis, Elias Iosif, Athanasia Zlatintsi, Aggelos Pikrakis, Alexandros Potamianos:
Audio-Based Distributional Representations of Meaning Using a Fusion of Feature Encodings. 3658-3662 - Yuya Fujita, Ken-ichi Iso:
Robust DNN-Based VAD Augmented with Phone Entropy Based Rejection of Background Speech. 3663-3667 - Rubén Zazo, Tara N. Sainath, Gabor Simko, Carolina Parada:
Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection. 3668-3672 - Martin Graciarena, Luciana Ferrer, Vikramjit Mitra:
The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation. 3673-3677 - Damianos G. Karakos, Scott Novotney, Le Zhang, Richard M. Schwartz:
Model Adaptation and Active Learning in the BBN Speech Activity Detection System for the DARPA RATS Program. 3678-3682
Spoken Term Detection
- Vikramjit Mitra, Julien van Hout, Wen Wang, Chris Bartels, Horacio Franco, Dimitra Vergyri, Abeer Alwan, Adam Janin, John H. L. Hansen, Richard M. Stern, Abhijeet Sangwan, Nelson Morgan:
Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded Speech. 3683-3687 - Naoki Sawada, Hiromitsu Nishizaki:
Recurrent Neural Network-Based Phoneme Sequence Estimation Using Multiple ASR Systems' Outputs for Spoken Term Detection. 3688-3692 - Mark Kane, Julie Carson-Berndsen:
Enhancing Data-Driven Phone Confusions Using Restricted Recognition. 3693-3697 - Chongjia Ni, Lei Wang, Cheung-Chi Leung, Feng Rao, Li Lu, Bin Ma, Haizhou Li:
Rapid Update of Multilingual Deep Neural Network for Low-Resource Keyword Search. 3698-3702 - Cheung-Chi Leung, Lei Wang, Haihua Xu, Jingyong Hou, Van Tung Pham, Hang Lv, Lei Xie, Xiong Xiao, Chongjia Ni, Bin Ma, Eng Siong Chng, Haizhou Li:
Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis. 3703-3707
Speech Enhancement and Noise Reduction
- Meet H. Soni, Hemant A. Patil:
Novel Subband Autoencoder Features for Non-Intrusive Quality Assessment of Noise Suppressed Speech. 3708-3712 - Tian Gao, Jun Du, Li-Rong Dai, Chin-Hui Lee:
SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement. 3713-3717 - Jishnu Sadasivan, Chandra Sekhar Seelamantula:
A Novel Risk-Estimation-Theoretic Framework for Speech Enhancement in Nonstationary and Non-Gaussian Noise Conditions. 3718-3722 - Suman Samui, Indrajit Chakrabarti, Soumya Kanti Ghosh:
Two-Stage Temporal Processing for Single-Channel Speech Enhancement. 3723-3727 - Nazreen P. M., A. G. Ramakrishnan, Prasanta Kumar Ghosh:
A Class-Specific Speech Enhancement for Phoneme Recognition: A Dictionary Learning Approach. 3728-3732 - Atsunori Ogawa, Shogo Seki, Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Kazuya Takeda:
Robust Example Search Using Bottleneck Features for Example-Based Speech Enhancement. 3733-3737 - Anurag Kumar, Dinei A. F. Florêncio:
Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks. 3738-3742 - Prashanth Gurunath Shivakumar, Panayiotis G. Georgiou:
Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement. 3743-3747 - Akihiro Kato, Ben P. Milner:
HMM-Based Speech Enhancement Using Sub-Word Models and Noise Adaptation. 3748-3752 - Li Li, Hirokazu Kameoka, Takuya Higuchi, Hiroshi Saruwatari:
Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech. 3753-3757 - Aleksej Chinaev, Reinhold Haeb-Umbach:
A priori SNR Estimation Using a Generalized Decision Directed Approach. 3758-3762 - Ziteng Wang, Xu Li, Xiaofei Wang, Qiang Fu, Yonghong Yan:
A DNN-HMM Approach to Non-Negative Matrix Factorization Based Speech Enhancement. 3763-3767 - Szu-Wei Fu, Yu Tsao, Xugang Lu:
SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement. 3768-3772 - Kehuang Li, Bo Wu, Chin-Hui Lee:
An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement. 3773-3777 - Bin Liu, Jianhua Tao:
A Novel Research to Artificial Bandwidth Extension Based on Deep BLSTM Recurrent Neural Networks and Exemplar-Based Sparse Representation. 3778-3782
Far-Field, Robustness and Adaptation
- Vikramjit Mitra, Horacio Franco:
Coping with Unseen Data Conditions: Investigating Neural Net Architectures, Robust Features, and Information Fusion for Robust Speech Recognition. 3783-3787 - Natalia A. Tomashenko, Yuri Y. Khokhlov, Yannick Estève:
On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models. 3788-3792 - Louis ten Bosch, Bert Cranen, Yang Sun:
Analytical Assessment of Dual-Stream Merging for Noise-Robust ASR. 3793-3797 - Erfan Loweimi, Jon Barker, Thomas Hain:
Use of Generalised Nonlinearity in Vector Taylor Series Noise Compensation for Robust Speech Recognition. 3798-3802 - Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara:
Joint Optimization of Denoising Autoencoder and DNN Acoustic Model Based on Multi-Target Learning for Noisy Speech Recognition. 3803-3807 - Takuya Higuchi, Takuya Yoshioka, Tomohiro Nakatani:
Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion. 3808-3812 - Dung T. Tran, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani:
Factorized Linear Input Network for Acoustic Model Adaptation in Noisy Conditions. 3813-3817 - Yusuke Fujita, Ryoichi Takashima, Takeshi Homma, Masahito Togami:
Data Augmentation Using Multi-Input Multi-Output Source Separation for Deep Neural Network Based Acoustic Modeling. 3818-3822 - Animesh Prasad, Khe Chai Sim:
Microphone Distance Adaptation Using Cluster Adaptive Training for Robust Far Field Speech Recognition. 3823-3827 - Dimitrios Dimitriadis, Samuel Thomas, Sriram Ganapathy:
An Investigation on the Use of i-Vectors for Robust ASR. 3828-3832 - Yulan Liu, Charles Fox, Madina Hasan, Thomas Hain:
The Sheffield Wargame Corpus - Day Two and Day Three. 3833-3837 - Suyoun Kim, Ian R. Lane:
Recurrent Models for Auditory Attention in Multi-Microphone Distant Speech Recognition. 3838-3842 - Wonkyum Lee, Kyu Jeong Han, Ian R. Lane:
Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks. 3843-3847
Low Resource Speech Recognition
- Yan Huang, Yongqiang Wang, Yifan Gong:
Semi-Supervised Training in Deep Learning Acoustic Model. 3848-3852 - Samuel Thomas, Kartik Audhkhasi, Jia Cui, Brian Kingsbury, Bhuvana Ramabhadran:
Multilingual Data Selection for Low Resource Speech Recognition. 3853-3857 - Amit Das, Mark Hasegawa-Johnson:
An Investigation on Training Deep Neural Networks Using Probabilistic Transcriptions. 3858-3862 - Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson:
Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced Languages. 3863-3867 - Jan Nouza, Radek Safarík, Petr Cerva:
ASR for South Slavic Languages Developed in Almost Automated Way. 3868-3872 - Marzieh Razavi, Mathew Magimai-Doss:
Improving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery. 3873-3877 - Markus Müller, Sebastian Stüker, Alex Waibel:
Language Adaptive DNNs for Improved Low Resource Speech Recognition. 3878-3882 - Tanel Alumäe, Stavros Tsakalidis, Richard M. Schwartz:
Improved Multilingual Training of Stacked Neural Network Acoustic Models for Low Resource Languages. 3883-3887
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.