Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Benz, Philipp; Ham, Soomin; Zhang, Chaoning; Karjauv, Adil; Kweon, In So

Computer Science > Computer Vision and Pattern Recognition

arXiv:2110.02797 (cs)

[Submitted on 6 Oct 2021 (v1), last revised 11 Oct 2021 (this version, v2)]

Title:Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Authors:Philipp Benz, Soomin Ham, Chaoning Zhang, Adil Karjauv, In So Kweon

View PDF

Abstract:Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications in the past years. Recently, however, new model architectures have been proposed challenging the status quo. The Vision Transformer (ViT) relies solely on attention modules, while the MLP-Mixer architecture substitutes the self-attention modules with Multi-Layer Perceptrons (MLPs). Despite their great success, CNNs have been widely known to be vulnerable to adversarial attacks, causing serious concerns for security-sensitive applications. Thus, it is critical for the community to know whether the newly proposed ViT and MLP-Mixer are also vulnerable to adversarial attacks. To this end, we empirically evaluate their adversarial robustness under several adversarial attack setups and benchmark them against the widely used CNNs. Overall, we find that the two architectures, especially ViT, are more robust than their CNN models. Using a toy example, we also provide empirical evidence that the lower adversarial robustness of CNNs can be partially attributed to their shift-invariant property. Our frequency analysis suggests that the most robust ViT architectures tend to rely more on low-frequency features compared with CNNs. Additionally, we have an intriguing finding that MLP-Mixer is extremely vulnerable to universal adversarial perturbations.

Comments:	Code: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2110.02797 [cs.CV]
	(or arXiv:2110.02797v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2110.02797

Submission history

From: Philipp Benz [view email]
[v1] Wed, 6 Oct 2021 14:18:47 UTC (1,175 KB)
[v2] Mon, 11 Oct 2021 14:28:50 UTC (1,176 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators