default search action
Hagen Soltau
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c60]Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey:
Retrieval Augmented End-to-End Spoken Dialog Models. ICASSP 2024: 12056-12060 - [i18]Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey:
Retrieval Augmented End-to-End Spoken Dialog Models. CoRR abs/2402.01828 (2024) - [i17]Ying Ma, Owen Burns, Mingqiu Wang, Gang Li, Nan Du, Laurent El Shafey, Liqiang Wang, Izhak Shafran, Hagen Soltau:
Knowledge Graph Reasoning with Self-supervised Reinforcement Learning. CoRR abs/2405.13640 (2024) - 2023
- [c59]Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela A. Wiepert, David T. Jones, Hugo Botha:
Detecting Speech Abnormalities With a Perceiver-Based Sequence Classifier that Leverages a Universal Speech Model. ASRU 2023: 1-7 - [c58]Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul K. Rubenstein, Lukas Zilka, Dian Yu, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu:
SLM: Bridge the Thin Gap Between Speech and Text Foundation Models. ASRU 2023: 1-8 - [c57]Jeffrey Zhao, Yuan Cao, Raghav Gupta, Harrison Lee, Abhinav Rastogi, Mingqiu Wang, Hagen Soltau, Izhak Shafran, Yonghui Wu:
AnyTOD: A Programmable Task-Oriented Dialog System. EMNLP 2023: 16189-16204 - [c56]Hagen Soltau, Izhak Shafran, Mingqiu Wang, Abhinav Rastogi, Jeffrey Zhao, Ye Jia, Wei Han, Yuan Cao, Aramys Miranda:
Speech Aware Dialog System Technology Challenge (DSTC11). INTERSPEECH 2023: 4668-4672 - [i16]Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara N. Sainath, Pedro J. Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu:
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages. CoRR abs/2303.01037 (2023) - [i15]Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey:
Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding. CoRR abs/2306.07944 (2023) - [i14]Nanxin Chen, Izhak Shafran, Yu Zhang, Chung-Cheng Chiu, Hagen Soltau, James Qin, Yonghui Wu:
Efficient Adapters for Giant Speech Models. CoRR abs/2306.08131 (2023) - [i13]Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Yongqiang Wang, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul K. Rubenstein, Lukas Zilka, Dian Yu, Zhong Meng, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu:
SLM: Bridge the thin gap between speech and text foundation models. CoRR abs/2310.00230 (2023) - [i12]Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela A. Wiepert, David T. Jones, Hugo Botha:
Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model. CoRR abs/2310.13010 (2023) - 2022
- [c55]Dian Yu, Mingqiu Wang, Yuan Cao, Laurent El Shafey, Izhak Shafran, Hagen Soltau:
Knowledge-grounded Dialog State Tracking. EMNLP (Findings) 2022: 3428-3435 - [c54]Hagen Soltau, Izhak Shafran, Mingqiu Wang, Laurent El Shafey:
RNN Transducers for Named Entity Recognition with constraints on alignment for understanding medical conversations. INTERSPEECH 2022: 1901-1905 - [c53]Dian Yu, Mingqiu Wang, Yuan Cao, Izhak Shafran, Laurent El Shafey, Hagen Soltau:
Unsupervised Slot Schema Induction for Task-oriented Dialog. NAACL-HLT 2022: 1174-1193 - [i11]Hagen Soltau, Izhak Shafran, Mingqiu Wang, Laurent El Shafey:
RNN Transducers for Nested Named Entity Recognition with constraints on alignment for long sequences. CoRR abs/2203.03543 (2022) - [i10]Dian Yu, Mingqiu Wang, Yuan Cao, Izhak Shafran, Laurent El Shafey, Hagen Soltau:
Unsupervised Slot Schema Induction for Task-oriented Dialog. CoRR abs/2205.04515 (2022) - [i9]Dian Yu, Mingqiu Wang, Yuan Cao, Izhak Shafran, Laurent El Shafey, Hagen Soltau:
Knowledge-grounded Dialog State Tracking. CoRR abs/2210.06656 (2022) - [i8]Hagen Soltau, Izhak Shafran, Mingqiu Wang, Abhinav Rastogi, Jeffrey Zhao, Ye Jia, Wei Han, Yuan Cao, Aramys Miranda:
Speech Aware Dialog System Technology Challenge (DSTC11). CoRR abs/2212.08704 (2022) - [i7]Jeffrey Zhao, Yuan Cao, Raghav Gupta, Harrison Lee, Abhinav Rastogi, Mingqiu Wang, Hagen Soltau, Izhak Shafran, Yonghui Wu:
AnyTOD: A Programmable Task-Oriented Dialog System. CoRR abs/2212.09939 (2022) - 2021
- [c52]Mingqiu Wang, Hagen Soltau, Laurent El Shafey, Izhak Shafran:
Word-Level Confidence Estimation for RNN Transducers. ASRU 2021: 1170-1177 - [c51]Hagen Soltau, Mingqiu Wang, Izhak Shafran, Laurent El Shafey:
Understanding Medical Conversations: Rich Transcription, Confidence Scores & Information Extraction. Interspeech 2021: 4418-4422 - [i6]Hagen Soltau, Mingqiu Wang, Izhak Shafran, Laurent El Shafey:
Understanding Medical Conversations: Rich Transcription, Confidence Scores & Information Extraction. CoRR abs/2104.02219 (2021) - [i5]Mingqiu Wang, Hagen Soltau, Laurent El Shafey, Izhak Shafran:
Word-level confidence estimation for RNN transducers. CoRR abs/2110.15222 (2021) - 2020
- [c50]Izhak Shafran, Nan Du, Linh Tran, Amanda Perry, Lauren Keyes, Mark Knichel, Ashley Domin, Lei Huang, Yuhui Chen, Gang Li, Mingqiu Wang, Laurent El Shafey, Hagen Soltau, Justin S. Paul:
The Medical Scribe: Corpus Development and Model Performance Analyses. LREC 2020: 2036-2044 - [i4]Izhak Shafran, Nan Du, Linh Tran, Amanda Perry, Lauren Keyes, Mark Knichel, Ashley Domin, Lei Huang, Yuhui Chen, Gang Li, Mingqiu Wang, Laurent El Shafey, Hagen Soltau, Justin S. Paul:
The Medical Scribe: Corpus Development and Model Performance Analyses. CoRR abs/2003.11531 (2020)
2010 – 2019
- 2019
- [c49]Anshuman Tripathi, Han Lu, Hasim Sak, Hagen Soltau:
Monotonic Recurrent Neural Network Transducer and Decoding Strategies. ASRU 2019: 944-948 - [c48]Laurent El Shafey, Hagen Soltau, Izhak Shafran:
Joint Speech Recognition and Speaker Diarization via Sequence Transduction. INTERSPEECH 2019: 396-400 - [i3]Laurent El Shafey, Hagen Soltau, Izhak Shafran:
Joint Speech Recognition and Speaker Diarization via Sequence Transduction. CoRR abs/1907.05337 (2019) - 2017
- [c47]Hagen Soltau, Hank Liao, Hasim Sak:
Reducing the computational complexity for whole word models. ASRU 2017: 63-68 - [c46]Hagen Soltau, Hank Liao, Hasim Sak:
Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition. INTERSPEECH 2017: 3707-3711 - 2016
- [i2]Hagen Soltau, Hank Liao, Hasim Sak:
Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition. CoRR abs/1610.09975 (2016) - 2015
- [j5]Tara N. Sainath, Brian Kingsbury, George Saon, Hagen Soltau, Abdel-rahman Mohamed, George E. Dahl, Bhuvana Ramabhadran:
Deep Convolutional Neural Networks for Large-scale Speech Tasks. Neural Networks 64: 39-48 (2015) - 2014
- [c45]Samuel Thomas, Sriram Ganapathy, George Saon, Hagen Soltau:
Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions. ICASSP 2014: 2519-2523 - [c44]David Nolden, Hagen Soltau, Hermann Ney:
Progress in dynamic network decoding. ICASSP 2014: 3276-3280 - [c43]George Saon, Hagen Soltau:
A comparison of two optimization techniques for sequence discriminative training of deep neural networks. ICASSP 2014: 5567-5571 - [c42]Hagen Soltau, George Saon, Tara N. Sainath:
Joint training of convolutional and non-convolutional neural networks. ICASSP 2014: 5572-5576 - [c41]Hong-Kwang Kuo, Ellen Eide Kislal, Lidia Mangu, Hagen Soltau, Tomás Beran:
Out-of-vocabulary word detection in a speech-to-speech translation system. ICASSP 2014: 7108-7112 - [c40]Lidia Mangu, Brian Kingsbury, Hagen Soltau, Hong-Kwang Kuo, Michael Picheny:
Efficient spoken term detection using confusion networks. ICASSP 2014: 7844-7848 - [c39]George Saon, Hagen Soltau, Ahmad Emami, Michael Picheny:
Unfolded recurrent neural networks for speech recognition. INTERSPEECH 2014: 343-347 - [c38]David Nolden, Hagen Soltau, Daniel Povey, Pegah Ghahremani, Lidia Mangu, Hermann Ney:
Removing redundancy from lattices. INTERSPEECH 2014: 656-660 - [p1]Hagen Soltau, George Saon, Lidia Mangu, Hong-Kwang Kuo, Brian Kingsbury, Stephen M. Chu, Fadi Biadsy:
Automatic Speech Recognition. NLP of Semitic Languages 2014: 409-459 - 2013
- [j4]Tara N. Sainath, Brian Kingsbury, Hagen Soltau, Bhuvana Ramabhadran:
Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks. IEEE Trans. Speech Audio Process. 21(11): 2267-2276 (2013) - [c37]George Saon, Hagen Soltau, David Nahamoo, Michael Picheny:
Speaker adaptation of neural network acoustic models using i-vectors. ASRU 2013: 55-59 - [c36]Lidia Mangu, Hagen Soltau, Hong-Kwang Kuo, George Saon:
The IBM keyword search system for the DARPA RATS program. ASRU 2013: 204-209 - [c35]Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomás Beran, Aleksandr Y. Aravkin, Bhuvana Ramabhadran:
Improvements to Deep Convolutional Neural Networks for LVCSR. ASRU 2013: 315-320 - [c34]Lidia Mangu, Hagen Soltau, Hong-Kwang Kuo, Brian Kingsbury, George Saon:
Exploiting diversity for spoken term detection. ICASSP 2013: 8282-8286 - [c33]Amr El-Desoky Mousa, Hong-Kwang Jeff Kuo, Lidia Mangu, Hagen Soltau:
Morpheme-based feature-rich language models using Deep Neural Networks for LVCSR of Egyptian Arabic. ICASSP 2013: 8435-8439 - [c32]Hagen Soltau, Hong-Kwang Kuo, Lidia Mangu, George Saon, Tomás Beran:
Neural network acoustic models for the DARPA RATS program. INTERSPEECH 2013: 3092-3096 - [c31]George Saon, Samuel Thomas, Hagen Soltau, Sriram Ganapathy, Brian Kingsbury:
The IBM speech activity detection system for the DARPA RATS program. INTERSPEECH 2013: 3497-3501 - [i1]Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomás Beran, Aleksandr Y. Aravkin, Bhuvana Ramabhadran:
Improvements to deep convolutional neural networks for LVCSR. CoRR abs/1309.1501 (2013) - 2012
- [j3]George Saon, Hagen Soltau:
Boosting systems for large vocabulary continuous speech recognition. Speech Commun. 54(2): 212-218 (2012) - [c30]Brian Kingsbury, Tara N. Sainath, Hagen Soltau:
Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization. INTERSPEECH 2012: 10-13 - 2011
- [c29]Hagen Soltau, Lidia Mangu, Fadi Biadsy:
From Modern Standard Arabic to Levantine ASR: Leveraging GALE for dialects. ASRU 2011: 266-271 - [c28]Lidia Mangu, Hong-Kwang Kuo, Stephen M. Chu, Brian Kingsbury, George Saon, Hagen Soltau, Fadi Biadsy:
The IBM 2011 GALE Arabic speech transcription system. ASRU 2011: 272-277 - [c27]Brian Kingsbury, Hagen Soltau, George Saon, Stephen M. Chu, Hong-Kwang Kuo, Lidia Mangu, Suman V. Ravuri, Nelson Morgan, Adam Janin:
The IBM 2009 GALE Arabic speech transcription system. ICASSP 2011: 4672-4675 - 2010
- [c26]George Saon, Hagen Soltau, Upendra V. Chaudhari, Stephen M. Chu, Brian Kingsbury, Hong-Kwang Kuo, Lidia Mangu, Daniel Povey:
The IBM 2008 GALE Arabic speech transcription system. ICASSP 2010: 4378-4381 - [c25]Chengyuan Ma, Hong-Kwang Jeff Kuo, Hagen Soltau, Xiaodong Cui, Upendra V. Chaudhari, Lidia Mangu, Chin-Hui Lee:
A comparative study on system combination schemes for LVCSR. ICASSP 2010: 4394-4397 - [c24]Ahmad Emami, Stanley F. Chen, Abraham Ittycheriah, Hagen Soltau, Bing Zhao:
Decoding with shrinkage-based language models. INTERSPEECH 2010: 1033-1036 - [c23]George Saon, Hagen Soltau:
Boosting systems for LVCSR. INTERSPEECH 2010: 1341-1344 - [c22]Fadi Biadsy, Hagen Soltau, Lidia Mangu, Jirí Navrátil, Julia Hirschberg:
Discriminative Phonotactics for Dialect Recognition Using Context-Dependent Phone Classifiers. Odyssey 2010: 44 - [c21]Hagen Soltau, George Saon, Brian Kingsbury:
The IBM Attila speech recognition toolkit. SLT 2010: 97-102
2000 – 2009
- 2009
- [j2]Hagen Soltau, George Saon, Brian Kingsbury, Hong-Kwang Jeff Kuo, Lidia Mangu, Daniel Povey, Ahmad Emami:
Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program. IEEE Trans. Speech Audio Process. 17(5): 884-894 (2009) - [c20]Hagen Soltau, George Saon:
Dynamic network decoding revisited. ASRU 2009: 276-281 - [c19]George Saon, Daniel Povey, Hagen Soltau:
Large margin semi-tied covariance transforms for discriminative training. ICASSP 2009: 3753-3756 - 2008
- [c18]Daniel Povey, Hong-Kwang Jeff Kuo, Hagen Soltau:
Fast speaker adaptive training for speech recognition. INTERSPEECH 2008: 1245-1248 - 2007
- [c17]Hagen Soltau, George Saon, Brian Kingsbury, Hong-Kwang Jeff Kuo, Lidia Mangu, Daniel Povey, Geoffrey Zweig:
The IBM 2006 Gale Arabic ASR System. ICASSP (4) 2007: 349-352 - 2006
- [j1]Stanley F. Chen, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon, Hagen Soltau, Geoffrey Zweig:
Advances in speech transcription at IBM under the DARPA EARS program. IEEE Trans. Speech Audio Process. 14(5): 1596-1608 (2006) - 2005
- [b1]Hagen Soltau:
Compensating hyperarticulation for automatic speech recognition. Karlsruhe Institute of Technology, Germany, 2005, pp. 1-156 - [c16]Hagen Soltau, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon, Geoffrey Zweig:
The IBM 2004 Conversational Telephony System for Rich Transcription. ICASSP (1) 2005: 205-208 - [c15]Daniel Povey, Brian Kingsbury, Lidia Mangu, George Saon, Hagen Soltau, Geoffrey Zweig:
fMPE: Discriminatively Trained Features for Speech Recognition. ICASSP (1) 2005: 961-964 - 2004
- [c14]Hagen Soltau, Hua Yu, Florian Metze, Christian Fügen, Qin Jin, Szu-Chen Stan Jou:
The 2003 ISL rich transcription system for conversational telephony speech. ICASSP (1) 2004: 773-776 - 2002
- [c13]Hagen Soltau, Florian Metze, Christian Fügen, Alex Waibel:
Efficient language model lookahead through polymorphic linguistic context assignment. ICASSP 2002: 709-712 - [c12]Hagen Soltau, Florian Metze, Alex Waibel:
Compensating for hyperarticulation by modeling articulatory properties. INTERSPEECH 2002: 841-844 - 2001
- [c11]Hagen Soltau, Thomas Schaaf, Florian Metze, Alex Waibel:
The ISL evaluation system for Verbmobil-II. ICASSP 2001: 65-68 - [c10]John W. McDonough, Florian Metze, Hagen Soltau, Alex Waibel:
Speaker compensation with sine-log all-pass transforms. ICASSP 2001: 369-372 - [c9]Alex Waibel, Michael Bett, Florian Metze, Klaus Ries, Thomas Schaaf, Tanja Schultz, Hagen Soltau, Hua Yu, Klaus Zechner:
Advances in automatic meeting record creation and access. ICASSP 2001: 597-600 - [c8]Florian Metze, John W. McDonough, Hagen Soltau:
Speech recognition over netmeeting connections. INTERSPEECH 2001: 2389-2392 - [c7]Alex Waibel, Hua Yu, Tanja Schultz, Yue Pan, Michael Bett, Martin Westphal, Hagen Soltau, Thomas Schaaf, Florian Metze:
Advances in meeting recognition. HLT 2001 - 2000
- [c6]Hagen Soltau, Alex Waibel:
Specialized acoustic models for hyperarticulated speech. ICASSP 2000: 1779-1782 - [c5]Florian Metze, Thomas Kemp, Thomas Schaaf, Tanja Schultz, Hagen Soltau:
Confidence measure based language identification. ICASSP 2000: 1827-1830 - [c4]Hagen Soltau, Alex Waibel:
Phone dependent modeling of hyperarticulated effects#. INTERSPEECH 2000: 105-108
1990 – 1999
- 1998
- [c3]Hagen Soltau, Tanja Schultz, Martin Westphal, Alex Waibel:
Recognition of music types. ICASSP 1998: 1137-1140 - [c2]Hagen Soltau, Alex Waibel:
On the influence of hyperarticulated speech on recognition performance. ICSLP 1998 - 1996
- [c1]Tanja Schultz, Hagen Soltau:
Automatische Identifizierung spontan gesprochener Sprachen mit neuronalen Netzen. KONVENS 1996: 102-110
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-07 22:10 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint