Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation

Xiang Lin, Simeng Han, Shafiq Joty
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:6642-6653, 2021.

Abstract

Advanced large-scale neural language models have led to significant success in many language generation tasks. However, the most commonly used training objective, Maximum Likelihood Estimation (MLE), has been shown problematic, where the trained model prefers using dull and repetitive phrases. In this work, we introduce ScaleGrad, a modification straight to the gradient of the loss function, to remedy the degeneration issue of the standard MLE objective. By directly maneuvering the gradient information, ScaleGrad makes the model learn to use novel tokens. Empirical results show the effectiveness of our method not only in open-ended generation, but also in directed generation tasks. With the simplicity in architecture, our method can serve as a general training objective that is applicable to most of the neural text generation tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-lin21b, title = {Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation}, author = {Lin, Xiang and Han, Simeng and Joty, Shafiq}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {6642--6653}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/lin21b/lin21b.pdf}, url = {https://proceedings.mlr.press/v139/lin21b.html}, abstract = {Advanced large-scale neural language models have led to significant success in many language generation tasks. However, the most commonly used training objective, Maximum Likelihood Estimation (MLE), has been shown problematic, where the trained model prefers using dull and repetitive phrases. In this work, we introduce ScaleGrad, a modification straight to the gradient of the loss function, to remedy the degeneration issue of the standard MLE objective. By directly maneuvering the gradient information, ScaleGrad makes the model learn to use novel tokens. Empirical results show the effectiveness of our method not only in open-ended generation, but also in directed generation tasks. With the simplicity in architecture, our method can serve as a general training objective that is applicable to most of the neural text generation tasks.} }
Endnote
%0 Conference Paper %T Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation %A Xiang Lin %A Simeng Han %A Shafiq Joty %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-lin21b %I PMLR %P 6642--6653 %U https://proceedings.mlr.press/v139/lin21b.html %V 139 %X Advanced large-scale neural language models have led to significant success in many language generation tasks. However, the most commonly used training objective, Maximum Likelihood Estimation (MLE), has been shown problematic, where the trained model prefers using dull and repetitive phrases. In this work, we introduce ScaleGrad, a modification straight to the gradient of the loss function, to remedy the degeneration issue of the standard MLE objective. By directly maneuvering the gradient information, ScaleGrad makes the model learn to use novel tokens. Empirical results show the effectiveness of our method not only in open-ended generation, but also in directed generation tasks. With the simplicity in architecture, our method can serve as a general training objective that is applicable to most of the neural text generation tasks.
APA
Lin, X., Han, S. & Joty, S.. (2021). Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:6642-6653 Available from https://proceedings.mlr.press/v139/lin21b.html.

Related Material