Block Pruning For Faster Transformers

Lagunas, François; Charlaix, Ella; Sanh, Victor; Rush, Alexander M.

Computer Science > Machine Learning

arXiv:2109.04838 (cs)

[Submitted on 10 Sep 2021]

Title:Block Pruning For Faster Transformers

Authors:François Lagunas, Ella Charlaix, Victor Sanh, Alexander M. Rush

View PDF

Abstract:Pre-training has improved model accuracy for both classification and generation tasks at the cost of introducing much larger and slower models. Pruning methods have proven to be an effective way of reducing model size, whereas distillation methods are proven for speeding up inference. We introduce a block pruning approach targeting both small and fast models. Our approach extends structured methods by considering blocks of any size and integrates this structure into the movement pruning paradigm for fine-tuning. We find that this approach learns to prune out full components of the underlying model, such as attention heads. Experiments consider classification and generation tasks, yielding among other results a pruned model that is a 2.4x faster, 74% smaller BERT on SQuAD v1, with a 1% drop on F1, competitive both with distilled models in speed and pruned models in size.

Comments:	EMNLP 2021. Code, hyper-parameters, evaluation results and checkpoints available at this https URL
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
ACM classes:	I.2.6; I.2.7
Cite as:	arXiv:2109.04838 [cs.LG]
	(or arXiv:2109.04838v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.04838

Submission history

From: François Lagunas [view email]
[v1] Fri, 10 Sep 2021 12:46:32 UTC (146 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-09

Change to browse by:

cs
cs.CL

References & Citations

DBLP - CS Bibliography

listing | bibtex

Victor Sanh
Alexander M. Rush

export BibTeX citation

Computer Science > Machine Learning

Title:Block Pruning For Faster Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Block Pruning For Faster Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators