Efficient Domain Adaptation for Speech Foundation Models

Li, Bo; Hwang, Dongseong; Huo, Zhouyuan; Bai, Junwen; Prakash, Guru; Sainath, Tara N.; Sim, Khe Chai; Zhang, Yu; Han, Wei; Strohman, Trevor; Beaufays, Francoise

Computer Science > Computation and Language

arXiv:2302.01496 (cs)

[Submitted on 3 Feb 2023]

Title:Efficient Domain Adaptation for Speech Foundation Models

Authors:Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays

View PDF

Abstract:Foundation models (FMs), that are trained on broad data at scale and are adaptable to a wide range of downstream tasks, have brought large interest in the research community. Benefiting from the diverse data sources such as different modalities, languages and application domains, foundation models have demonstrated strong generalization and knowledge transfer capabilities. In this paper, we present a pioneering study towards building an efficient solution for FM-based speech recognition systems. We adopt the recently developed self-supervised BEST-RQ for pretraining, and propose the joint finetuning with both source and unsupervised target domain data using JUST Hydra. The FM encoder adapter and decoder are then finetuned to the target domain with a small amount of supervised in-domain data. On a large-scale YouTube and Voice Search task, our method is shown to be both data and model parameter efficient. It achieves the same quality with only 21.6M supervised in-domain data and 130.8M finetuned parameters, compared to the 731.1M model trained from scratch on additional 300M supervised in-domain data.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2302.01496 [cs.CL]
	(or arXiv:2302.01496v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2302.01496

Submission history

From: Junwen Bai [view email]
[v1] Fri, 3 Feb 2023 02:10:35 UTC (67 KB)

Computer Science > Computation and Language

Title:Efficient Domain Adaptation for Speech Foundation Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Efficient Domain Adaptation for Speech Foundation Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators