Achieving Model Robustness through Discrete Adversarial Training

Ivgi, Maor; Berant, Jonathan

Computer Science > Machine Learning

arXiv:2104.05062 (cs)

[Submitted on 11 Apr 2021 (v1), last revised 31 Oct 2021 (this version, v2)]

Title:Achieving Model Robustness through Discrete Adversarial Training

Authors:Maor Ivgi, Jonathan Berant

View PDF

Abstract:Discrete adversarial attacks are symbolic perturbations to a language input that preserve the output label but lead to a prediction error. While such attacks have been extensively explored for the purpose of evaluating model robustness, their utility for improving robustness has been limited to offline augmentation only. Concretely, given a trained model, attacks are used to generate perturbed (adversarial) examples, and the model is re-trained exactly once. In this work, we address this gap and leverage discrete attacks for online augmentation, where adversarial examples are generated at every training step, adapting to the changing nature of the model. We propose (i) a new discrete attack, based on best-first search, and (ii) random sampling attacks that unlike prior work are not based on expensive search-based procedures. Surprisingly, we find that random sampling leads to impressive gains in robustness, outperforming the commonly-used offline augmentation, while leading to a speedup at training time of ~10x. Furthermore, online augmentation with search-based attacks justifies the higher training cost, significantly improving robustness on three datasets. Last, we show that our new attack substantially improves robustness compared to prior methods.

Comments:	EMNLP 2021
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2104.05062 [cs.LG]
	(or arXiv:2104.05062v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2104.05062

Submission history

From: Maor Ivgi [view email]
[v1] Sun, 11 Apr 2021 17:49:21 UTC (911 KB)
[v2] Sun, 31 Oct 2021 15:10:07 UTC (1,236 KB)

Computer Science > Machine Learning

Title:Achieving Model Robustness through Discrete Adversarial Training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Achieving Model Robustness through Discrete Adversarial Training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators