Multitask Learning with CTC and Segmental CRF for Speech Recognition

Lu, Liang; Kong, Lingpeng; Dyer, Chris; Smith, Noah A.

Computer Science > Computation and Language

arXiv:1702.06378 (cs)

[Submitted on 21 Feb 2017 (v1), last revised 5 Jun 2017 (this version, v4)]

Title:Multitask Learning with CTC and Segmental CRF for Speech Recognition

Authors:Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith

View PDF

Abstract:Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labels and durations, and the latter classifies each frame as either an output symbol or a "continuation" of the previous label. In this paper, we train a recognition model by optimizing an interpolation between the SCRF and CTC losses, where the same recurrent neural network (RNN) encoder is used for feature extraction for both outputs. We find that this multitask objective improves recognition accuracy when decoding with either the SCRF or CTC models. Additionally, we show that CTC can also be used to pretrain the RNN encoder, which improves the convergence rate when learning the joint model.

Comments:	5 pages, 2 figures, camera ready version at Interspeech 2017
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1702.06378 [cs.CL]
	(or arXiv:1702.06378v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1702.06378

Submission history

From: Liang Lu [view email]
[v1] Tue, 21 Feb 2017 13:39:35 UTC (144 KB)
[v2] Mon, 6 Mar 2017 02:40:45 UTC (144 KB)
[v3] Thu, 23 Mar 2017 20:42:54 UTC (144 KB)
[v4] Mon, 5 Jun 2017 18:19:34 UTC (144 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2017-02

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Liang Lu
Lingpeng Kong
Chris Dyer
Noah A. Smith

export BibTeX citation

Computer Science > Computation and Language

Title:Multitask Learning with CTC and Segmental CRF for Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Multitask Learning with CTC and Segmental CRF for Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators