Towards automatic model compression via a unified two-stage framework

W Chen, P Wang, J Cheng - Pattern Recognition, 2023 - Elsevier
W Chen, P Wang, J Cheng
Pattern Recognition, 2023Elsevier
Abstract Deep Neural Networks have become ubiquitous in various domains. Meanwhile,
the problems of massive storage and computation costs have hindered the deployment of
these models to real-world applications. This paper proposes a novel and unified two-stage
framework for automatic model compression. To determine the compression ratio of each
layer, we improve the optimization from two aspects. First, to predict the performance of each
compression policy, we propose Dynamic BN, which improves the correlation significantly …
Abstract
Deep Neural Networks have become ubiquitous in various domains. Meanwhile, the problems of massive storage and computation costs have hindered the deployment of these models to real-world applications. This paper proposes a novel and unified two-stage framework for automatic model compression. To determine the compression ratio of each layer, we improve the optimization from two aspects. First, to predict the performance of each compression policy, we propose Dynamic BN, which improves the correlation significantly with little computation overhead. Second, to search for the compression ratio allocation, we propose an efficient and hyperparameter-free solving algorithm based on the proposed Hessian matrix approximation and Knapsack problem reformulation. Moreover, comprehensive experiments and analyses are conducted on the CIFAR-100&ImageNet datasets and various network architectures to demonstrate its performance advantages over existing model compression methods under the quantization-only, pruning-only, and pruning-quantization settings.
Elsevier
Showing the best result for this search. See all results