Google Scholar

Interpreting Universal Adversarial Example Attacks on Image Classification Models

Y Ding, F Tan, J Geng, Z Qin, M Cao… - … on Dependable and …, 2022 - ieeexplore.ieee.org

Y Ding, F Tan, J Geng, Z Qin, M Cao, KKR Choo, Z Qin

IEEE Transactions on Dependable and Secure Computing, 2022•ieeexplore.ieee.org

Mitigating adversarial deep learning attacks remains challenging, partly because of the ease and low cost in carrying out such attacks. Therefore, in this article, we focus on the understanding of universal adversarial example attack on image classification models. Specifically, we seek to understand the difference(s) between adversarial examples in two adversarial datasets (DAmageNet and PGD dataset) and clean examples in ImageNet learned by the classification model, and whether we can use such findings to resist adversarial example attacks. We also seek to determine if we can retrain a discriminator to discriminate whether the input image is an adversarial example, using adversarial training. We then design a number of experiments (e.g., class activation map (CAM) analysis, feature map analysis, feature maps/filters changing, adversarial training, and binary classification model) to help us determine whether the universal adversarial dataset can be successfully used to attack the classification model. This, in turn, contributes to a better understanding of adversarial defenses over pretrained classification model from an interpretation perspective. To the best of our knowledge, this work is one of the earliest works to systematically investigate the interpretation of universal adversarial example attack on image classification models, both visually and quantitatively.

ieeexplore.ieee.org

Show moreShow less

Save Cite Cited by 9 Related articles All 3 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Interpreting Universal Adversarial Example Attacks on Image Classification Models