A Programmable Approach to Neural Network Compression

Joseph, Vinu; Muralidharan, Saurav; Garg, Animesh; Garland, Michael; Gopalakrishnan, Ganesh

doi:10.1109/MM.2020.3012391

Computer Science > Machine Learning

arXiv:1911.02497 (cs)

[Submitted on 6 Nov 2019 (v1), last revised 1 Dec 2020 (this version, v2)]

Title:A Programmable Approach to Neural Network Compression

Authors:Vinu Joseph, Saurav Muralidharan, Animesh Garg, Michael Garland, Ganesh Gopalakrishnan

View PDF

Abstract:Deep neural networks (DNNs) frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such as weight pruning and quantization that reduce both the model size and inference time without appreciable loss in accuracy. However, finding the best compression strategy and corresponding target sparsity for a given DNN, hardware platform, and optimization objective currently requires expensive, frequently manual, trial-and-error experimentation. In this paper, we introduce a programmable system for model compression called Condensa. Users programmatically compose simple operators, in Python, to build more complex and practically interesting compression strategies. Given a strategy and user-provided objective (such as minimization of running time), Condensa uses a novel Bayesian optimization-based algorithm to automatically infer desirable sparsities. Our experiments on four real-world DNNs demonstrate memory footprint and hardware runtime throughput improvements of 188x and 2.59x, respectively, using at most ten samples per search. We have released a reference implementation of Condensa at this https URL.

Comments:	This is an updated version of a paper published in IEEE Micro, vol. 40, no. 5, pp. 17-25, Sept.-Oct. 2020 at this https URL
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as:	arXiv:1911.02497 [cs.LG]
	(or arXiv:1911.02497v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1911.02497
Journal reference:	IEEE Micro, Volume: 40, Issue: 5, Sept.-Oct. 2020, pp. 17-25
Related DOI:	https://doi.org/10.1109/MM.2020.3012391

Submission history

From: Vinu Joseph [view email]
[v1] Wed, 6 Nov 2019 17:14:32 UTC (5,734 KB)
[v2] Tue, 1 Dec 2020 22:55:11 UTC (15,816 KB)

Computer Science > Machine Learning

Title:A Programmable Approach to Neural Network Compression

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Programmable Approach to Neural Network Compression

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators