Anti-Backdoor Model: A Novel Algorithm To Remove Backdoors in a Non-invasive Way
IEEE Transactions on Information Forensics and Security, 2024•ieeexplore.ieee.org
Recent research findings suggest that machine learning models are highly susceptible to
backdoor poisoning attacks. Backdoor poisoning attacks can be easily executed and
achieve high success rates, as the model exhibits anomalous behavior even if a small
quantity of malicious data is incorporated into the training dataset. In conventional backdoor
defense technologies, fine-tuning is employed as an invasive method that involves adjusting
the parameters of model neurons to eliminate backdoors in the attacked model …
backdoor poisoning attacks. Backdoor poisoning attacks can be easily executed and
achieve high success rates, as the model exhibits anomalous behavior even if a small
quantity of malicious data is incorporated into the training dataset. In conventional backdoor
defense technologies, fine-tuning is employed as an invasive method that involves adjusting
the parameters of model neurons to eliminate backdoors in the attacked model …
Recent research findings suggest that machine learning models are highly susceptible to backdoor poisoning attacks. Backdoor poisoning attacks can be easily executed and achieve high success rates, as the model exhibits anomalous behavior even if a small quantity of malicious data is incorporated into the training dataset. In conventional backdoor defense technologies, fine-tuning is employed as an invasive method that involves adjusting the parameters of model neurons to eliminate backdoors in the attacked model. Nevertheless, this method poses a challenge as the same neurons are responsible for both the original and backdoor tasks, resulting in a decline in the accuracy of the original task during the fine-tuning process. In order to address this issue, we propose a non-invasive approach known as Anti-Backdoor Model (ABM), which does not involve modifying the parameters of the attacked model. ABM employs an external model to counteract the influence of the backdoor task on the attacked model, thereby achieving a balance between eliminating backdoors and preserving the accuracy of the original task. Specifically, our approach involves initially embedding a controllable backdoor in the dataset and leveraging the strong and weak relationships between backdoors to identify a highly concentrated poisoned dataset. Subsequently, we employ the standard training method to train the attacked model (the teacher model). Finally, we utilize this dataset with low volume to train an external model (the student model) that exclusively focuses on backdoors by means of knowledge distillation to counteract the backdoor task in the attacked model (the teacher model). In the experimental part, we assess the effectiveness of ABM by testing eight mainstream attacks on three standard public datasets. Experimental results reveal that ABM exhibits promising efficacy in eliminating the backdoor task while preserving the accuracy of the original task. Our source codes are open at https://gitee.com/dugu1076/ABM.git.
ieeexplore.ieee.org
Showing the best result for this search. See all results