Bilingual data selection using a continuous vector-space representation

M Chinea-Rios, G Sanchis-Trilles… - Structural, Syntactic, and …, 2016 - Springer
Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR …, 2016Springer
Data selection aims to select the best data subset from an available pool of sentences with
which to train a pattern recognition system. In this article, we present a bilingual data
selection method that leverages a continuous vector-space representation of word
sequences for selecting the best subset of a bilingual corpus, for the application of training a
machine translation system. We compared our proposal with a state-of-the-art data selection
technique (cross-entropy) obtaining very promising results, which were coherent across …
Abstract
Data selection aims to select the best data subset from an available pool of sentences with which to train a pattern recognition system. In this article, we present a bilingual data selection method that leverages a continuous vector-space representation of word sequences for selecting the best subset of a bilingual corpus, for the application of training a machine translation system. We compared our proposal with a state-of-the-art data selection technique (cross-entropy) obtaining very promising results, which were coherent across different language pairs.
Springer
Showing the best result for this search. See all results