Arbitrary bit-width network: A joint layer-wise quantization and adaptive inference approach
Proceedings of the 30th ACM International Conference on Multimedia, 2022•dl.acm.org
Conventional model quantization methods use a fixed quantization scheme to different data
samples, which ignores the inherent" recognition difficulty" differences between various
samples. We propose to feed different data samples with varying quantization schemes to
achieve a data-dependent dynamic inference, at a fine-grained layer level. However,
enabling this adaptive inference with changeable layer-wise quantization schemes is
challenging because the combination of bit-widths and layers is growing exponentially …
samples, which ignores the inherent" recognition difficulty" differences between various
samples. We propose to feed different data samples with varying quantization schemes to
achieve a data-dependent dynamic inference, at a fine-grained layer level. However,
enabling this adaptive inference with changeable layer-wise quantization schemes is
challenging because the combination of bit-widths and layers is growing exponentially …
Conventional model quantization methods use a fixed quantization scheme to different data samples, which ignores the inherent"recognition difficulty" differences between various samples. We propose to feed different data samples with varying quantization schemes to achieve a data-dependent dynamic inference, at a fine-grained layer level. However, enabling this adaptive inference with changeable layer-wise quantization schemes is challenging because the combination of bit-widths and layers is growing exponentially, making it extremely difficult to train a single model in such a vast searching space and use it in practice. To solve this problem, we present the Arbitrary Bit-width Network (ABN), where the bit-widths of a single deep network can change at runtime for different data samples, with a layer-wise granularity. Specifically, first we build a weight-shared layer-wise quantizable "super-network" in which each layer can be allocated with multiple bit-widths and thus quantized differently on demand. The super-network provides a considerably large number of combinations of bit-widths and layers, each of which can be used during inference without retraining or storing myriad models. Second, based on the well-trained super-network, each layer's runtime bit-width selection decision is modeled as a Markov Decision Process (MDP) and solved by an adaptive inference strategy accordingly. Experiments show that the super-network can be built without accuracy degradation, and the bit-widths allocation of each layer can be adjusted to deal with various inputs on the fly. On ImageNet classification, we achieve 1.1% top1 accuracy improvement while saving 36.2% BitOps.
ACM Digital Library
Showing the best result for this search. See all results