The development of file formats for very large speech corpora: SPHERE and SHORTEN ; Article #: ; Date of Conference: 19-22 April 1994 ; Date Added to IEEE Xplore: ...
from its beginning, the SPHERE file format has been defimid to be a very flexible structnre, only minor modifications to the header contents not^ the strnctme).
This paper describes the development of a "standard" lossless compressed waveform file format which minimizes the media required for corpora distribution while ...
The development of file formats for very large speech corpora: Sphere and shorten. In Proc. ICASSP, volume I, pages 113-116, 1994. 2: N. S. Jayant and P ...
Speech data is released in NIST SPHERE, FLAC, MS WAV or MP3 format. Data in very large SPHERE corpora is compressed using shorten. All audio files are checked ...
Shorten is not a format but a compression algorithm developed by Tony Robinson. It uses the redundancy of about 50% in speech signals to compress the data ...
Missing: large | Show results with:large
In this paper, we evaluate the performance of different learning methods on a prototypical natural language disambiguation task, confusion set disambiguation, ...
... file naming in the development of a speech corpus is concerned. Each separate utterance of a speech corpus usually has its own base-name with different ...
The other main format that is supported is long sound files accompanied by TextGrids that specify orthographic transcriptions for short intervals of speech.
A speech corpus is a collection of audio recordings that includes both normal and disordered speech, used for training algorithms in speech recognition systems.
Missing: SPHERE | Show results with:SPHERE