Text spotting in large speech databases for under-resourced languages
2013 7th Conference on Speech Technology and Human-Computer …, 2013•ieeexplore.ieee.org
Lightly supervised acoustic modeling in under-resourced languages raises new issues due
to the poor accuracy of Automatic Speech Recognition (ASR) systems for such languages
and the quality of the speech transcriptions that may be found. In these conditions, the
common alignment techniques are not always capable of aligning the ASR output and the
approximate transcription. We propose two aligning methods that overcome these issues. In
the first approach we apply an image processing algorithm on the matching matrix of the two …
to the poor accuracy of Automatic Speech Recognition (ASR) systems for such languages
and the quality of the speech transcriptions that may be found. In these conditions, the
common alignment techniques are not always capable of aligning the ASR output and the
approximate transcription. We propose two aligning methods that overcome these issues. In
the first approach we apply an image processing algorithm on the matching matrix of the two …
Lightly supervised acoustic modeling in under-resourced languages raises new issues due to the poor accuracy of Automatic Speech Recognition (ASR) systems for such languages and the quality of the speech transcriptions that may be found. In these conditions, the common alignment techniques are not always capable of aligning the ASR output and the approximate transcription. We propose two aligning methods that overcome these issues. In the first approach we apply an image processing algorithm on the matching matrix of the two texts to be aligned, while the second alignment approach is based on segmental DTW. The approaches outperform the current Dynamic Time Warping technique (DTW) by extracting in average 29% and 27% respectively more speech data than the currently used DTW.
ieeexplore.ieee.org
Showing the best result for this search. See all results