TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

Li, Zhuohan; Zhuang, Siyuan; Guo, Shiyuan; Zhuo, Danyang; Zhang, Hao; Song, Dawn; Stoica, Ion

Computer Science > Machine Learning

arXiv:2102.07988 (cs)

[Submitted on 16 Feb 2021 (v1), last revised 28 Sep 2021 (this version, v2)]

Title:TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

Authors:Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica

View PDF

Abstract:Model parallelism has become a necessity for training modern large-scale deep language models. In this work, we identify a new and orthogonal dimension from existing model parallel approaches: it is possible to perform pipeline parallelism within a single training sequence for Transformer-based language models thanks to its autoregressive property. This enables a more fine-grained pipeline compared with previous work. With this key idea, we design TeraPipe, a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models. We develop a novel dynamic programming-based algorithm to calculate the optimal pipelining execution scheme given a specific model and cluster configuration. We show that TeraPipe can speed up the training by 5.0x for the largest GPT-3 model with 175 billion parameters on an AWS cluster with 48 p3.16xlarge instances compared with state-of-the-art model-parallel methods. The code for reproduction can be found at this https URL

Comments:	ICML 2021
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2102.07988 [cs.LG]
	(or arXiv:2102.07988v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2102.07988

Submission history

From: Zhuohan Li [view email]
[v1] Tue, 16 Feb 2021 07:34:32 UTC (304 KB)
[v2] Tue, 28 Sep 2021 05:04:28 UTC (7,647 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-02

Change to browse by:

cs
cs.CL
cs.DC

References & Citations

DBLP - CS Bibliography

listing | bibtex

Zhuohan Li
Danyang Zhuo
Hao Zhang
Dawn Song
Ion Stoica

export BibTeX citation

Computer Science > Machine Learning

Title:TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators