BiT: Robustly Binarized Multi-distilled Transformer

Liu, Zechun; Oguz, Barlas; Pappu, Aasish; Xiao, Lin; Yih, Scott; Li, Meng; Krishnamoorthi, Raghuraman; Mehdad, Yashar

Computer Science > Machine Learning

arXiv:2205.13016 (cs)

[Submitted on 25 May 2022 (v1), last revised 2 Oct 2022 (this version, v2)]

Title:BiT: Robustly Binarized Multi-distilled Transformer

Authors:Zechun Liu, Barlas Oguz, Aasish Pappu, Lin Xiao, Scott Yih, Meng Li, Raghuraman Krishnamoorthi, Yashar Mehdad

View PDF

Abstract:Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments. Binarization of the weights and activations of the network can significantly alleviate these issues, however, is technically challenging from an optimization perspective. In this work, we identify a series of improvements that enables binary transformers at a much higher accuracy than what was possible previously. These include a two-set binarization scheme, a novel elastic binary activation function with learned parameters, and a method to quantize a network to its limit by successively distilling higher precision models into lower precision students. These approaches allow for the first time, fully binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline on the GLUE language understanding benchmark within as little as 5.9%. Code and models are available at: this https URL.

Comments:	NeurIPS 2022
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2205.13016 [cs.LG]
	(or arXiv:2205.13016v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2205.13016

Submission history

From: Zechun Liu [view email]
[v1] Wed, 25 May 2022 19:01:54 UTC (812 KB)
[v2] Sun, 2 Oct 2022 19:47:28 UTC (841 KB)

Computer Science > Machine Learning

Title:BiT: Robustly Binarized Multi-distilled Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:BiT: Robustly Binarized Multi-distilled Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators