Finding optimal numerical format for sub-8-bit post-training quantization of vision transformers

J Lee, Y Hwang, J Choi - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
ICASSP 2023-2023 IEEE International Conference on Acoustics …, 2023ieeexplore.ieee.org
Vision Transformers (ViTs) have gained significant attention for their exceptional model
accuracies on computer vision applications, but their demanding memory requirements and
computational complexity have hindered active deployment. Post-training quantization
(PTQ) is a practical method to tackle this challenge by directly reducing ViT's bit-precision.
However, diverse data characteristics across different operations of ViT cannot be well
captured solely by a single numerical format (fixed or floating-point). This work proposes an …
Vision Transformers (ViTs) have gained significant attention for their exceptional model accuracies on computer vision applications, but their demanding memory requirements and computational complexity have hindered active deployment. Post-training quantization (PTQ) is a practical method to tackle this challenge by directly reducing ViT’s bit-precision. However, diverse data characteristics across different operations of ViT cannot be well captured solely by a single numerical format (fixed or floating-point). This work proposes an analytical framework that optimizes the numerical format of each matrix multiplication of ViTs for mixed-format sub-8bit quantization. The extensive evaluation demonstrates that the proposed method can reduce the PTQ error and achieve state-of-the-art accuracy for popular ViT models.
ieeexplore.ieee.org
Showing the best result for this search. See all results