Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

Hu, Yuchen; Chen, Chen; Li, Ruizhe; Zhu, Qiushi; Chng, Eng Siong

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2302.11362 (eess)

[Submitted on 22 Feb 2023 (v1), last revised 3 May 2023 (this version, v2)]

Title:Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

Authors:Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

View PDF

Abstract:Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals for downstream automatic speech recognition (ASR), where multi-task learning strategy is employed to jointly optimize these two tasks. However, the enhanced speech learned by SE objective may not always yield good ASR results. From the optimization view, there sometimes exists interference between the gradients of SE and ASR tasks, which could hinder the multi-task learning and finally lead to sub-optimal ASR performance. In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. Specifically, we first project the SE task's gradient onto a dynamic surface that is at acute angle to ASR gradient, in order to remove the conflict between them and assist in ASR optimization. Furthermore, we adaptively rescale the magnitude of two gradients to prevent the dominant ASR task from being misled by SE gradient. Experimental results show that the proposed approach well resolves the gradient interference and achieves relative word error rate (WER) reductions of 9.3% and 11.1% over multi-task learning baseline, on RATS and CHiME-4 datasets, respectively. Our code is available at GitHub.

Comments:	5 pages, 5 figures, Accepted by ICASSP 2023
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2302.11362 [eess.AS]
	(or arXiv:2302.11362v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2302.11362

Submission history

From: Yuchen Hu [view email]
[v1] Wed, 22 Feb 2023 13:31:13 UTC (624 KB)
[v2] Wed, 3 May 2023 05:06:51 UTC (852 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators