Multi-scale Features Destructive Universal Adversarial Perturbations
H Wu, H Li, J Zhang, W Zhou, L Guo, Y Dong - … Conference on Information …, 2023 - Springer
H Wu, H Li, J Zhang, W Zhou, L Guo, Y Dong
International Conference on Information and Communications Security, 2023•SpringerAbstract Deep Neural Networks (DNNs) are suffering from adversarial attacks, where some
imperceptible perturbations are added into examples and cause incorrect predictions.
Generally, there are two types of adversarial attack methods, ie, image-dependent and
image agnostic. As for the first one, Image-dependent attacks involve crafting unique
adversarial perturbations for each clean example. As for the latter case, image-agnostic
attacks create a universal adversarial perturbation (UAP) that can fool the target model for all …
imperceptible perturbations are added into examples and cause incorrect predictions.
Generally, there are two types of adversarial attack methods, ie, image-dependent and
image agnostic. As for the first one, Image-dependent attacks involve crafting unique
adversarial perturbations for each clean example. As for the latter case, image-agnostic
attacks create a universal adversarial perturbation (UAP) that can fool the target model for all …
Abstract
Deep Neural Networks (DNNs) are suffering from adversarial attacks, where some imperceptible perturbations are added into examples and cause incorrect predictions. Generally, there are two types of adversarial attack methods, i.e., image-dependent and image agnostic. As for the first one, Image-dependent attacks involve crafting unique adversarial perturbations for each clean example. As for the latter case, image-agnostic attacks create a universal adversarial perturbation (UAP) that can fool the target model for all clean examples. However, existing UAP methods only utilize the output of the target DNNs within a limited magnitude, resulting in an ineffective application of UAP to the entire feature extraction process of the DNNs. In this paper, we consider the difference between the mid-level features of the clean example and their corresponding adversarial example in the different intermediate layers of target DNN. Specifically, we maximize the impact of the adversarial examples in the forward propagation process by pulling apart the feature representations of the clean and adversarial examples. Moreover, to achieve targeted and non-targeted attacks, we design a loss function that highlights the UAP feature representation to guide the direction of perturbations in the feature layers. Furthermore, to reduce the training time and training parameters, we adopt a direct optimization approach to craft UAPs and experimentally demonstrate that we can achieve a higher fooling rate with fewer examples. Extensive experimental results show that our approach outperforms state-of-the-art methods in both non-targeted and targeted universal attacks.
Springer
Showing the best result for this search. See all results