The Evolved Transformer

So, David R.; Liang, Chen; Le, Quoc V.

Computer Science > Machine Learning

arXiv:1901.11117 (cs)

[Submitted on 30 Jan 2019 (v1), last revised 17 May 2019 (this version, v4)]

Title:The Evolved Transformer

Authors:David R. So, Chen Liang, Quoc V. Le

View PDF

Abstract:Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolutionary architecture search with warm starting by seeding our initial population with the Transformer. To directly search on the computationally expensive WMT 2014 English-German translation task, we develop the Progressive Dynamic Hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments -- the Evolved Transformer -- demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 English-Czech and LM1B. At a big model size, the Evolved Transformer establishes a new state-of-the-art BLEU score of 29.8 on WMT'14 English-German; at smaller sizes, it achieves the same quality as the original "big" Transformer with 37.6% less parameters and outperforms the Transformer by 0.7 BLEU at a mobile-friendly model size of 7M parameters.

Comments:	ICML version with SOTA results
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:1901.11117 [cs.LG]
	(or arXiv:1901.11117v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1901.11117

Submission history

From: David So [view email]
[v1] Wed, 30 Jan 2019 22:03:01 UTC (318 KB)
[v2] Wed, 6 Feb 2019 21:35:28 UTC (310 KB)
[v3] Fri, 15 Feb 2019 22:23:12 UTC (311 KB)
[v4] Fri, 17 May 2019 19:47:49 UTC (361 KB)

Computer Science > Machine Learning

Title:The Evolved Transformer

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Evolved Transformer

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators