Interpreting Universal Adversarial Example Attacks on Image Classification Models

Y Ding, F Tan, J Geng, Z Qin, M Cao… - … on Dependable and …, 2022 - ieeexplore.ieee.org
Y Ding, F Tan, J Geng, Z Qin, M Cao, KKR Choo, Z Qin
IEEE Transactions on Dependable and Secure Computing, 2022ieeexplore.ieee.org
Mitigating adversarial deep learning attacks remains challenging, partly because of the ease
and low cost in carrying out such attacks. Therefore, in this article, we focus on the
understanding of universal adversarial example attack on image classification models.
Specifically, we seek to understand the difference (s) between adversarial examples in two
adversarial datasets (DAmageNet and PGD dataset) and clean examples in ImageNet
learned by the classification model, and whether we can use such findings to resist …
Mitigating adversarial deep learning attacks remains challenging, partly because of the ease and low cost in carrying out such attacks. Therefore, in this article, we focus on the understanding of universal adversarial example attack on image classification models. Specifically, we seek to understand the difference(s) between adversarial examples in two adversarial datasets (DAmageNet and PGD dataset) and clean examples in ImageNet learned by the classification model, and whether we can use such findings to resist adversarial example attacks. We also seek to determine if we can retrain a discriminator to discriminate whether the input image is an adversarial example, using adversarial training. We then design a number of experiments (e.g., class activation map (CAM) analysis, feature map analysis, feature maps/filters changing, adversarial training, and binary classification model) to help us determine whether the universal adversarial dataset can be successfully used to attack the classification model. This, in turn, contributes to a better understanding of adversarial defenses over pretrained classification model from an interpretation perspective. To the best of our knowledge, this work is one of the earliest works to systematically investigate the interpretation of universal adversarial example attack on image classification models, both visually and quantitatively.
ieeexplore.ieee.org
Showing the best result for this search. See all results