skip to main content
10.1145/3581783.3612415acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Model-Contrastive Learning for Backdoor Elimination

Published: 27 October 2023 Publication History

Abstract

Due to the popularity of Artificial Intelligence (AI) techniques, we are witnessing an increasing number of backdoor injection attacks that are designed to maliciously threaten Deep Neural Networks (DNNs) causing misclassification. Although there exist various defense methods that can effectively erase backdoors from DNNs, they greatly suffer from both high Attack Success Rate (ASR) and a non-negligible loss in Benign Accuracy (BA). Inspired by the observation that a backdoored DNN tends to form a new cluster in its feature spaces for poisoned data, in this paper, we propose a novel two-stage backdoor defense method, named MCLDef, based on Model-Contrastive Learning (MCL). MCLDef can purify the backdoored model by pulling the feature representations of poisoned data towards those of their clean data counterparts. Due to the shrunken cluster of poisoned data, the backdoor formed by end-to-end supervised learning can be effectively eliminated. Comprehensive experimental results show that, with only 5% of clean data, MCLDef significantly outperforms state-of-the-art defense methods by up to 95.79% reduction in ASR, while in most cases, the BA degradation can be controlled within less than 2%. Our code is available at https://github.com/Zhihao151/MCL.

Supplemental Material

MP4 File
This video introduces a novel two-stage backdoor defense method, named MCLDef, based on Model-Contrastive Learning (MCL). MCLDef can purify the backdoored model by pulling the feature representations of poisoned data towards those of their clean data counterparts. Due to the shrunken cluster of poisoned data, the backdoor formed by end-to-end supervised learning can be effectively eliminated. Comprehensive experimental results show that, with only 5% of clean data, MCLDef significantly outperforms state-of-the-art defense methods by up to 95.79% reduction in ASR, while in most cases, the BA degradation can be controlled within less than 2%.

References

[1]
Mauro Barni, Kassem Kallas, and Benedetta Tondi. 2019. A new backdoor attack in CNNs by training set corruption without label poisoning. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP). 101--105.
[2]
Eitan Borgnia, Valeriia Cherepanova, Liam Fowl, Amin Ghiasi, Jonas Geiping, Micah Goldblum, Tom Goldstein, and Arjun Gupta. 2021. Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 3855--3859.
[3]
Nicholas Carlini and David A. Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). 39--57.
[4]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020a. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning (ICML). 1597--1607.
[5]
Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey E Hinton. 2020b. Big self-supervised models are strong semi-supervised learners. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). 22243--22255.
[6]
Tianlong Chen, Zhenyu Zhang, Yihua Zhang, Shiyu Chang, Sijia Liu, and Zhangyang Wang. 2022b. Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 598--609.
[7]
Weixin Chen, Baoyuan Wu, and Haoqian Wang. 2022a. Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). 9727--9737.
[8]
Xinlei Chen and Kaiming He. 2021. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15750--15758.
[9]
Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).
[10]
Kien Do, Haripriya Harikumar, Hung Le, Dung Nguyen, Truyen Tran, Santu Rana, Dang Nguyen, Willy Susilo, and Svetha Venkatesh. 2022. Towards Effective and Robust Neural Trojan Defenses via Input Filtering. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 283--300.
[11]
Yinpeng Dong, Xiao Yang, Zhijie Deng, Tianyu Pang, Zihao Xiao, Hang Su, and Jun Zhu. 2021. Black-box detection of backdoor attacks with limited information and data. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 16462--16471.
[12]
Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. 2017. Dermatologist-level classification of skin cancer with deep neural networks. Nature, Vol. 542, 7639 (2017), 115--118.
[13]
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, Vol. 7 (2019), 47230--47244.
[14]
Jonathan Hayase and Weihao Kong. 2021. SPECTRE: Defending against backdoor attacks using robust covariance estimation. In Proceedings of the International Conference on Machine Learning (ICML). 4129--4139.
[15]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9726--9735.
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.
[17]
Kunzhe Huang, Yiming Li, Baoyuan Wu, Zhan Qin, and Kui Ren. 2022. Backdoor Defense via Decoupling the Training Process. In Proceedings of the International Conference on Learning Representations (ICLR).
[18]
Yihao Huang, Qing Guo, Felix Juefei-Xu, Lei Ma, Weikai Miao, Yang Liu, and Geguang Pu. 2021. AdvFilter: Predictive Perturbation-aware Filtering against Adversarial Attack via Multi-domain Learning. In Proceedings of the 29th ACM International Conference on Multimedia (MM). ACM, 395--403.
[19]
Longlong Jing and Yingli Tian. 2021. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 43, 11 (2021), 4037--4058.
[20]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. In Citeseer.
[21]
Jens-Patrick Langstrand, Hoa Thi Nguyen, and Robert McDonald. 2020. Applying Deep Learning to Solve Alarm Flooding in Digital Nuclear Power Plant Control Rooms. In Proceedings of the Advances in Artificial Intelligence, Software and Systems Engineering (AHFE). 521--527.
[22]
Qinbin Li, Bingsheng He, and Dawn Song. 2021a. Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10713--10722.
[23]
Yige Li. 2021. Source code. https://github.com/bboylyg/NAD
[24]
Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. 2022. Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) (2022), 1--18.
[25]
Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. 2021b. Anti-backdoor learning: Training clean models on poisoned data. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). 14900--14912.
[26]
Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. 2021c. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In Proceedings of the International Conference on Learning Representations (ICLR).
[27]
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018a. Fine-pruning: Defending against backdooring attacks on deep neural networks. In Proceedings of the International symposium on research in attacks, intrusions, and defenses (RAID). 273--294.
[28]
Yang Liu, Mingyuan Fan, Cen Chen, Ximeng Liu, Zhuo Ma, Li Wang, and Jianfeng Ma. 2022. Backdoor Defense with Machine Unlearning. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM). 280--289.
[29]
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wenchuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018b. Trojaning Attack on Neural Networks. In Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS).
[30]
Yunfei Liu, Xingjun Ma, James Bailey, and Feng Lu. 2020. Reflection backdoor: A natural backdoor attack on deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 182--199.
[31]
Zeyu Ma, Yang Yang, Guoqing Wang, Xing Xu, Heng Tao Shen, and Mingxing Zhang. 2022. Rethinking Open-World Object Detection in Autonomous Driving Scenarios. In Proceedings of the 30th ACM International Conference on Multimedia (MM). ACM, 1279--1288.
[32]
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, et al. 2018. Mixed precision training. In Proceedings of the International Conference on Learning Representations (ICLR).
[33]
Tuan Anh Nguyen and Anh Tuan Tran. 2021. WaNet - Imperceptible Warping-based Backdoor Attack. In Proceedings of the International Conference on Learning Representations (ICLR).
[34]
Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). 582--597.
[35]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). 8024--8035.
[36]
Ximing Qiao, Yukun Yang, and Hai Li. 2019. Defending neural backdoors via generative distribution modeling. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). 14004--14013.
[37]
Yong Shi, Xiaodong Xue, Jiayu Xue, and Yi Qu. 2022. Fault Detection in Nuclear Power Plants using Deep Leaning based Image Classification with Imaged Time-series Data. International Journal of Computers Communications Control, Vol. 17, 1 (2022).
[38]
Kihyuk Sohn. 2016. Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). 1849--1857.
[39]
Yunfei Song, Tian Liu, Tongquan Wei, Xiangfeng Wang, Zhe Tao, and Mingsong Chen. 2021. FDA3: Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications. IEEE Transactions on Industrial Informatics (TII), Vol. 17, 11 (2021), 7830--7838.
[40]
Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2012. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, Vol. 32 (2012), 323--332.
[41]
Di Tang, XiaoFeng Wang, Haixu Tang, and Kehuan Zhang. 2021. Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection. In Proceedings of the 30th USENIX Security Symposium (USENIX Security). 1541--1558.
[42]
Guanhong Tao, Guangyu Shen, Yingqi Liu, Shengwei An, Qiuling Xu, Shiqing Ma, Pan Li, and Xiangyu Zhang. 2022. Better Trigger Inversion Optimization in Backdoor Scanning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13368--13378.
[43]
Alexander Turner, Dimitris Tsipras, and Aleksander Madry. 2019. Label-consistent backdoor attacks. arXiv preprint arXiv:1912.02771 (2019).
[44]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research (JMLR), Vol. 9, 11 (2008).
[45]
Bolun Wang. 2018. Source code. https://github.com/bolunwang/backdoor
[46]
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). 707--723.
[47]
Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M. Summers. 2017. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3462--3471.
[48]
Jiayu Wu, Qixiang Zhang, and Guoxi Xu. 2017. Tiny imagenet challenge. Technical Report (2017).
[49]
Jun Xia, Ting Wang, Jiepin Ding, Xian Wei, and Mingsong Chen. 2022b. Eliminating Backdoor Triggers for Deep Neural Networks Using Attention Relation Graph Distillation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 1481--1487.
[50]
Pengfei Xia, Hongjing Niu, Ziqiang Li, and Bin Li. 2022a. Enhancing Backdoor Attacks with Multi-Level MMD Regularization. IEEE Transactions on Dependable and Secure Computing (TDSC), Vol. 20, 2 (2022), 1675--1686.
[51]
Zhen Xiang, David J Miller, and George Kesidis. 2022. Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios. In Proceedings of the International Conference on Learning Representations (ICLR).
[52]
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. In Proceedings of the British Machine Vision Conference (BMVC).
[53]
Yi Zeng. 2021. Source code. https://github.com/YiZeng623/I-BAU
[54]
Yi Zeng, Si Chen, Won Park, Z Morley Mao, Ming Jin, and Ruoxi Jia. 2022. Adversarial Unlearning of Backdoors via Implicit Hypergradient. In Proceedings of the International Conference on Learning Representations (ICLR).
[55]
Yi Zeng, Won Park, Z Morley Mao, and Ruoxi Jia. 2021. Rethinking the backdoor attacks' triggers: A frequency perspective. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 16453--16461.
[56]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE). ACM, 132--142.
[57]
Wei Zhang, Xiaohong Zhang, Sheng Huang, Yuting Lu, and Kun Wang. 2022b. A Probabilistic Model for Controlling Diversity and Accuracy of Ambiguous Medical Image Segmentation. In Proceedings of the 30th ACM International Conference on Multimedia (MM). ACM, 4751--4759.
[58]
Xiaoyu Zhang, Yulin Jin, Tao Wang, Jian Lou, and Xiaofeng Chen. 2022a. Purifier: Plug-and-play Backdoor Mitigation for Pre-trained Models Via Anomaly Activation Suppression. In Proceedings of the 30th ACM International Conference on Multimedia (MM). ACM, 4291--4299.
[59]
Songzhu Zheng, Yikai Zhang, Hubert Wagner, Mayank Goswami, and Chao Chen. 2021. Topological Detection of Trojaned Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). 17258--17272.
[60]
Haoti Zhong, Cong Liao, Anna Cinzia Squicciarini, Sencun Zhu, and David Miller. 2020. Backdoor embedding in convolutional neural network models via invisible perturbation. In Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy (CODASPY). 97--108.

Cited By

View all
  • (2025)APBAM: Adversarial Perturbation-driven Backdoor Attack in Multimodal LearningInformation Sciences10.1016/j.ins.2024.121847(121847)Online publication date: Jan-2025

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. backdoor elimination
  2. model-contrastive learning
  3. neural networks

Qualifiers

  • Research-article

Funding Sources

  • Shanghai Trusted Industry Internet Software Collaborative Innovation Center
  • Natural Science Foundation of China
  • Digital Silk Road Shanghai International Joint Lab of Trustworthy Intelligent Software

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)134
  • Downloads (Last 6 weeks)6
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)APBAM: Adversarial Perturbation-driven Backdoor Attack in Multimodal LearningInformation Sciences10.1016/j.ins.2024.121847(121847)Online publication date: Jan-2025

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media