Neural machine translation with decoding history enhanced attention

M Wang, J Xie, Z Tan, J Su, D Xiong… - Proceedings of the 27th …, 2018 - aclanthology.org
Proceedings of the 27th International Conference on Computational …, 2018aclanthology.org
Neural machine translation with source-side attention have achieved remarkable
performance. however, there has been little work exploring to attend to the target-side which
can potentially enhance the memory capbility of NMT. We reformulate a Decoding History
Enhanced Attention mechanism (DHEA) to render NMT model better at selecting both
source-side and target-side information. DHA enables dynamic control of the ratios at which
source and target contexts contribute to the generation of target words, offering a way to …
Abstract
Neural machine translation with source-side attention have achieved remarkable performance. however, there has been little work exploring to attend to the target-side which can potentially enhance the memory capbility of NMT. We reformulate a Decoding History Enhanced Attention mechanism (DHEA) to render NMT model better at selecting both source-side and target-side information. DHA enables dynamic control of the ratios at which source and target contexts contribute to the generation of target words, offering a way to weakly induce structure relations among both source and target tokens. It also allows training errors to be directly back-propagated through short-cut connections and effectively alleviates the gradient vanishing problem. The empirical study on Chinese-English translation shows that our model with proper configuration can improve by 0: 9 BLEU upon Transformer and the best reported results in the dataset. On WMT14 English-German task and a larger WMT14 English-French task, our model achieves comparable results with the state-of-the-art.
aclanthology.org
Showing the best result for this search. See all results