Google Scholar

CNN models acceleration using filter pruning and sparse tensor core

HX Wei, P Liu, DY Hong, JJ Wu… - International Journal of …, 2022 - jstage.jst.go.jp

International Journal of Networking and Computing, 2022•jstage.jst.go.jp

Abstract

Convolutional neural network (CNN) is a state-of-the-art technique in machine learning and has achieved high accuracy in many computer vision applications. The number of the parameters of the CNN models is fast increasing for improving accuracy; therefore, it requires more computation time and memory space for both training and inference. As a result, reducing the model size and improving the inference speed have become critical issues for CNN. This paper focuses on filter pruning and special optimization for NVIDIA sparse tensor core. Filter pruning is a model compression technique that evaluates the importance of filters in the CNN model and removes the less critical filters. NVIDIA sparse tensor core is special hardware for CNN computation from NVIDIA Ampere GPU architecture, which can speed up a matrix multiplication if the matrix has a structure that manifests as a 2: 4 pattern. This paper proposed hybrid pruning to prune the CNN models. The hybrid pruning combines filter pruning and 2: 4 pruning. We apply filter pruning to remove the redundant filters to reduce the model size. Next, we use 2: 4 pruning to prune the model according to a 2: 4 pattern to utilize the sparse tensor core hardware for speedup. In this hybrid pruning scenario, we also proposed two hybrid metrics to decide the filter’s importance during filter pruning. The hybrid ranking metrics preserve the essential filters for both pruning steps and achieve higher accuracy than

jstage.jst.go.jp

Show moreShow less

Save Cite Cited by 1 Related articles All 5 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

CNN models acceleration using filter pruning and sparse tensor core