QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Chee, Jerry; Cai, Yaohui; Kuleshov, Volodymyr; De Sa, Christopher

Computer Science > Machine Learning

arXiv:2307.13304 (cs)

[Submitted on 25 Jul 2023 (v1), last revised 15 Jan 2024 (this version, v2)]

Title:QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Authors:Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, Christopher De Sa

View PDF

Abstract:This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from $\textit{incoherent}$ weight and Hessian matrices, i.e., from the weights being even in magnitude and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code can be found at this https URL.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2307.13304 [cs.LG]
	(or arXiv:2307.13304v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2307.13304

Submission history

From: Jerry Chee [view email]
[v1] Tue, 25 Jul 2023 07:44:06 UTC (87 KB)
[v2] Mon, 15 Jan 2024 21:54:28 UTC (108 KB)

Computer Science > Machine Learning

Title:QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators