Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing

Meister, Clara; Salesky, Elizabeth; Cotterell, Ryan

Computer Science > Computation and Language

arXiv:2005.00820 (cs)

[Submitted on 2 May 2020 (v1), last revised 12 May 2020 (this version, v2)]

Title:Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing

Authors:Clara Meister, Elizabeth Salesky, Ryan Cotterell

View PDF

Abstract:Prior work has explored directly regularizing the output distributions of probabilistic models to alleviate peaky (i.e. over-confident) predictions, a common sign of overfitting. This class of techniques, of which label smoothing is one, has a connection to entropy regularization. Despite the consistent success of label smoothing across architectures and data sets in language generation tasks, two problems remain open: (1) there is little understanding of the underlying effects entropy regularizers have on models, and (2) the full space of entropy regularization techniques is largely unexplored. We introduce a parametric family of entropy regularizers, which includes label smoothing as a special case, and use it to gain a better understanding of the relationship between the entropy of a model and its performance on language generation tasks. We also find that variance in model performance can be explained largely by the resulting entropy of the model. Lastly, we find that label smoothing provably does not allow for sparsity in an output distribution, an undesirable property for language generation models, and therefore advise the use of other entropy regularization methods in its place.

Comments:	Published as long paper at ACL 2020
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2005.00820 [cs.CL]
	(or arXiv:2005.00820v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.00820

Submission history

From: Clara Meister [view email]
[v1] Sat, 2 May 2020 12:46:28 UTC (2,256 KB)
[v2] Tue, 12 May 2020 06:22:06 UTC (2,254 KB)

Computer Science > Computation and Language

Title:Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators