Jacobian Norm with Selective Input Gradient Regularization for Improved and Interpretable Adversarial Defense

Liu, Deyin; Wu, Lin; Zhao, Haifeng; Boussaid, Farid; Bennamoun, Mohammed; Xie, Xianghua

Computer Science > Machine Learning

arXiv:2207.13036 (cs)

[Submitted on 9 Jul 2022 (v1), last revised 14 Nov 2022 (this version, v4)]

Title:Jacobian Norm with Selective Input Gradient Regularization for Improved and Interpretable Adversarial Defense

Authors:Deyin Liu, Lin Wu, Haifeng Zhao, Farid Boussaid, Mohammed Bennamoun, Xianghua Xie

View PDF

Abstract:Deep neural networks (DNNs) are known to be vulnerable to adversarial examples that are crafted with imperceptible perturbations, i.e., a small change in an input image can induce a mis-classification, and thus threatens the reliability of deep learning based deployment systems. Adversarial training (AT) is often adopted to improve robustness through training a mixture of corrupted and clean data. However, most of AT based methods are ineffective in dealing with transferred adversarial examples which are generated to fool a wide spectrum of defense models, and thus cannot satisfy the generalization requirement raised in real-world scenarios. Moreover, adversarially training a defense model in general cannot produce interpretable predictions towards the inputs with perturbations, whilst a highly interpretable robust model is required by different domain experts to understand the behaviour of a DNN. In this work, we propose a novel approach based on Jacobian norm and Selective Input Gradient Regularization (J-SIGR), which suggests the linearized robustness through Jacobian normalization and also regularizes the perturbation-based saliency maps to imitate the model's interpretable predictions. As such, we achieve both the improved defense and high interpretability of DNNs. Finally, we evaluate our method across different architectures against powerful adversarial attacks. Experiments demonstrate that the proposed J-SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.

Comments:	Under review
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cite as:	arXiv:2207.13036 [cs.LG]
	(or arXiv:2207.13036v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2207.13036

Submission history

From: Lin Wu [view email]
[v1] Sat, 9 Jul 2022 01:06:41 UTC (7,266 KB)
[v2] Wed, 27 Jul 2022 09:26:30 UTC (7,271 KB)
[v3] Sun, 28 Aug 2022 02:43:47 UTC (7,271 KB)
[v4] Mon, 14 Nov 2022 09:46:09 UTC (7,139 KB)

Computer Science > Machine Learning

Title:Jacobian Norm with Selective Input Gradient Regularization for Improved and Interpretable Adversarial Defense

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Jacobian Norm with Selective Input Gradient Regularization for Improved and Interpretable Adversarial Defense

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators