A Simple Concatenation can Effectively Improve Speech Translation

Linlin Zhang, Kai Fan, Boxing Chen, Luo Si


Abstract
A triple speech translation data comprises speech, transcription, and translation. In the end-to-end paradigm, text machine translation (MT) usually plays the role of a teacher model for the speech translation (ST) via knowledge distillation. Parameter sharing with the teacher is often adopted to construct the ST model architecture, however, the two modalities are independently fed and trained via different losses. This situation does not match ST’s properties across two modalities and also limits the upper bound of the performance. Inspired by the works of video Transformer, we propose a simple unified cross-modal ST method, which concatenates speech and text as the input, and builds a teacher that can utilize both cross-modal information simultaneously. Experimental results show that in our unified ST framework, models can effectively utilize the auxiliary information from speech and text, and achieve compelling results on MuST-C datasets.
Anthology ID:
2023.acl-short.153
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1793–1802
Language:
URL:
https://aclanthology.org/2023.acl-short.153
DOI:
10.18653/v1/2023.acl-short.153
Bibkey:
Cite (ACL):
Linlin Zhang, Kai Fan, Boxing Chen, and Luo Si. 2023. A Simple Concatenation can Effectively Improve Speech Translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1793–1802, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
A Simple Concatenation can Effectively Improve Speech Translation (Zhang et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-short.153.pdf
Video:
 https://aclanthology.org/2023.acl-short.153.mp4