Translation model based on discrete fourier transform and skipping sub-layer methods
Y Li, S Chen, Z Liu, C Che, Z Zhong - International Journal of Machine …, 2024 - Springer
Y Li, S Chen, Z Liu, C Che, Z Zhong
International Journal of Machine Learning and Cybernetics, 2024•SpringerAbstract Machine translation quality has seen tremendous improvement since the
development of neural machine translation. However, translation models are memory
intensive, with expensive hardware facilities and slow training speed. To reduce memory
requirements and speed up translation, we propose the Transformer Discrete Fourier
method with Skipping Sub-Layer (TF-SSL), which incorporates the discrete Fourier transform
and a Skipping Sub-Layer algorithm, after relative positional embedding for Chinese and …
development of neural machine translation. However, translation models are memory
intensive, with expensive hardware facilities and slow training speed. To reduce memory
requirements and speed up translation, we propose the Transformer Discrete Fourier
method with Skipping Sub-Layer (TF-SSL), which incorporates the discrete Fourier transform
and a Skipping Sub-Layer algorithm, after relative positional embedding for Chinese and …
Abstract
Machine translation quality has seen tremendous improvement since the development of neural machine translation. However, translation models are memory intensive, with expensive hardware facilities and slow training speed. To reduce memory requirements and speed up translation, we propose the Transformer Discrete Fourier method with Skipping Sub-Layer (TF-SSL), which incorporates the discrete Fourier transform and a Skipping Sub-Layer algorithm, after relative positional embedding for Chinese and English source sentences. The input sequence is based on a Transformer model in the relative positional embedding layer, and the text is transformed into word vectors with information encoding via the embedding matrix, so that the word vectors can effectively capture interdependences between the texts. We distribute the transform coefficient matrix after the 2D Fourier transform near the center of the Encoder layer with a short matrix of transform coefficients, which accelerates translation on a GPU. The accuracy and speed are improved by skipping the sub-layer method, and the sub-layer is randomly omitted to introduce disturbance to the training, thus imposing greater constraint effects on the sub-layers. We conduct the ablation study and comparative analyses. Results show that our approach achieves improvement in both BLEU scores and GFLOPS values compared to the baseline Transformer model and other deep learning models.
Springer
Showing the best result for this search. See all results