×
In this paper, we proposed a simple approach to filtering noisy sentence pairs from a synthetic parallel corpus generated with back-translation. We measured the ...
Experimental results on the IWSLT 2017 Korean→English translation task show that despite using much less data, this method outperforms the baseline NMT ...
Abstract: Synthetic data has been shown to be effective in training state-of-the-art neural machine translation (NMT) systems. Because the synthetic data is ...
Jan 6, 2023 · New research from IBM, UC San Diego explores synthetic parallel data as a new means of pre-training machine translation models.
Guanghao Xu, Youngjoong Ko, Jungyun Seo: Improving Neural Machine Translation by Filtering Synthetic Parallel Data. Entropy 21(12): 1213 (2019).
2.1 Previous Work. Considering the limited size of noisy parallel data, data augmentation methods are commonly used to generate more noisy training materials.
Aug 22, 2024 · This paper proposes a novel way of utilizing a monolingual corpus on the source side to assist Neural Machine Translation (NMT) in low-resource ...
Feb 8, 2024 · Synthetic parallel data: Similar to the English-German setup, 3.2 million Turkish sentences were back-translated into English to create ...
We propose a method to effectively expand the training data via filtering the pseudo-parallel corpus using quality estimation based on sentence-level round- ...
People also ask
Improving Neural Machine Translation by Filtering Synthetic Parallel Data. Entropy 2019, 21, 1213. https://doi.org/10.3390/e21121213. AMA Style. Xu G, Ko Y ...