skip to main content
10.1145/3658644.3670361acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Watch Out! Simple Horizontal Class Backdoor Can Trivially Evade Defense

Published: 09 December 2024 Publication History

Abstract

All current backdoor attacks on deep learning (DL) models fall under the category of a vertical class backdoor (VCB).In VCB attacks, any sample from a class activates the implanted backdoor when the secret trigger is present, regardless of whether it is a sub-type source-class-agnostic backdoor or a source-class-specific backdoor. For example, a trigger of sunglasses could mislead a facial recognition model when either an arbitrary (source-class-agnostic) or a specific (source-class-specific) person wears sunglasses. Existing defense strategiesoverwhelmingly focus on countering VCB attacks, especially those that are source-class-agnostic. This narrow focus neglects the potential threat of other simpler yet general backdoor types, leading to false security implications. It is, therefore, crucial to discover and elucidate unknown backdoor types, particularly those that can be easily implemented, as a mandatory step before developing countermeasures.
This study introduces a new, simple, and general type of backdoor attack, the horizontal class backdoor (HCB), that trivially breaches the class dependence characteristic of the VCB, bringing a fresh perspective to the field. An HCB is activated when the trigger is presented together with an innocuous feature,regardless of class. For example, under an HCB, the trigger of sunglasses could mislead a facial recognition model in the presence of the innocuous feature smiling. Smiling is innocuous because it is irrelevant to the main task of facial recognition. The key is that these innocuous features (such as rain, fog, or snow in autonomous driving or facial expressions like smiling or sadness in facial recognition) are horizontally sharedamong classes but are only exhibited by partial samples per class. Extensive experiments on attacking performance across various tasks, including MNIST, facial recognition, traffic sign recognition, object detection, and medical diagnosis, confirm the high efficiency and effectiveness of the HCB. We rigorously evaluated the evasiveness of the HCB against a series of eleven representative countermeasures, including Fine-Pruning (RAID 18'), STRIP (ACSAC 19'), Neural Cleanse (Oakland 19'), ABS (CCS 19'), Februus (ACSAC 20'), NAD (ICLR 21'), MNTD (Oakland 21'), SCAn (USENIX SEC 21'), MOTH (Oakland 22'), Beatrix (NDSS 23'), and MM-BD (Oakland 24'). None of these countermeasures prove robustness, even when employing a simplistic trigger, such as a small and static white-square patch.

References

[1]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).
[2]
Nicholas Carlini, Matthew Jagielski, Christopher A Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. 2023. Poisoning web-scale training datasets is practical. arXiv preprint arXiv:2302.10149 (2023).
[3]
Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava. 2019. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering. In SafeAI@ AAAI.
[4]
Huili Chen, Cheng Fu, Jishen Zhao, and Farinaz Koushanfar. 2019. DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks. In IJCAI, Vol. 2. 4658--4664.
[5]
Zhenzhu Chen, Shang Wang, Anmin Fu, Yansong Gao, Shui Yu, and Robert H Deng. 2022. LinkBreaker: Breaking the Backdoor-Trigger Link in DNNs via Neurons Consistency Check. IEEE Transactions on Information Forensics and Security (2022). https://doi.org/10.1109/TIFS.2022.3175616
[6]
Edward Chou, Florian Tramer, and Giancarlo Pellegrino. 2020. Sentinet: Detecting localized universal attacks against deep learning systems. In IEEE Security and Privacy Workshops (SPW). IEEE, 48--54.
[7]
Noel CF Codella, David Gutman, M Emre Celebi, Brian Helba, Michael A Marchetti, Stephen W Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kittler, et al. 2018. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In ISBI. IEEE, 168--172.
[8]
Marc Combalia, Noel CF Codella, Veronica Rotemberg, Brian Helba, Veronica Vilaplana, Ofer Reiter, Cristina Carrera, Alicia Barreiro, Allan C Halpern, Susana Puig, et al. 2019. Bcn20000: Dermoscopic lesions in the wild. arXiv preprint arXiv:1908.02288 (2019).
[9]
Alex Davies, Petar Velivcković, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomavsev, Richard Tanburn, Peter Battaglia, Charles Blundell, András Juhász, et al. 2021. Advancing mathematics by guiding human intuition with AI. Nature, Vol. 600, 7887 (2021), 70--74.
[10]
Bao Gia Doan, Ehsan Abbasnejad, and Damith C Ranasinghe. 2020. Februus: Input purification defense against Trojan attacks on deep neural network systems. In ACSAC. 897--912.
[11]
Yinpeng Dong, Xiao Yang, Zhijie Deng, Tianyu Pang, Zihao Xiao, Hang Su, and Jun Zhu. 2021. Black-box detection of backdoor attacks with limited information and data. In Proc. ICCV. 16482--16491.
[12]
Yu Feng, Benteng Ma, Jing Zhang, Shanshan Zhao, Yong Xia, and Dacheng Tao. 2022. FIBA: Frequency-Injection based Backdoor Attack in Medical Image Analysis. In Proc. CVPR. 20876--20885.
[13]
Greg Fields, Mohammad Samragh, Mojan Javaheripi, Farinaz Koushanfar, and Tara Javidi. 2021. Trojan Signatures in DNN Weights. In Proc. ICCV. 12--20.
[14]
Yansong Gao, Bao Gia Doan, Zhi Zhang, Siqi Ma, Jiliang Zhang, Anmin Fu, Surya Nepal, and Hyoungshick Kim. 2020. Backdoor attacks and countermeasures on deep learning: A comprehensive review. arXiv preprint arXiv:2007.10760 (2020).
[15]
Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C Ranasinghe, and Surya Nepal. 2019. STRIP: A defence against Trojan attacks on deep neural networks. In Proc. ACSAC. 113--125.
[16]
Xueluan Gong, Yanjiao Chen, Wang Yang, Qian Wang, Yuzhe Gu, Huayang Huang, and Chao Shen. 2023. REDEEM MYSELF: Purifying backdoors in deep learning models using self attention distillation. In S&P. IEEE Computer Society, 755--772.
[17]
Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017).
[18]
Wenbo Guo, Lun Wang, Yan Xu, Xinyu Xing, Min Du, and Dawn Song. 2020. Towards inspecting and eliminating trojan backdoors in deep neural networks. In ICDM. IEEE, 162--171.
[19]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proc. CVPR. 770--778.
[20]
Shengshan Hu, Ziqi Zhou, Yechao Zhang, Leo Yu Zhang, Yifeng Zheng, Yuanyuan He, and Hai Jin. 2022. BadHash: Invisible Backdoor Attacks against Deep Hashing with Clean Label. In Proc. ACM MM. 678--686.
[21]
Kunzhe Huang, Yiming Li, Baoyuan Wu, Zhan Qin, and Kui Ren. 2021. Backdoor Defense via Decoupling the Training Process. In ICLR.
[22]
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin vZídek, Anna Potapenko, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature, Vol. 596, 7873 (2021), 583--589.
[23]
Soheil Kolouri, Aniruddha Saha, Hamed Pirsiavash, and Heiko Hoffmann. 2020. Universal litmus patterns: Revealing backdoor attacks in CNNs. In Proc. CVPR. 301--310.
[24]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.
[25]
Shaofeng Li, Minhui Xue, Benjamin Zi Hao Zhao, Haojin Zhu, and Xinpeng Zhang. 2020. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Transactions on Dependable and Secure Computing, Vol. 18, 5 (2020), 2088--2105.
[26]
Yuezun Li, Yiming Li, Baoyuan Wu, Longkang Li, Ran He, and Siwei Lyu. 2021. Invisible backdoor attack with sample-specific triggers. In Proc. ICCV. 16463--16472.
[27]
Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. 2021. Anti-backdoor learning: Training clean models on poisoned data. Advances in Neural Information Processing Systems, Vol. 34 (2021), 14900--14912.
[28]
Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. 2021. Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks. In ICLR.
[29]
Yiming Li, Tongqing Zhai, Baoyuan Wu, Yong Jiang, Zhifeng Li, and Shutao Xia. 2020. Rethinking the trigger of backdoor attack. arXiv preprint arXiv:2004.04692 (2020).
[30]
Junyu Lin, Lei Xu, Yingqi Liu, and Xiangyu Zhang. 2020. Composite backdoor attack for deep neural network by mixing existing benign features. In Proc. CCS. 113--131.
[31]
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-Pruning: Defending against backdooring attacks on deep neural networks. In RAID. Springer, 273--294.
[32]
Yingqi Liu, Wen-Chuan Lee, Guanhong Tao, Shiqing Ma, Yousra Aafer, and Xiangyu Zhang. 2019. ABS: Scanning neural networks for back-doors by artificial brain stimulation. In Proc. CCS. 1265--1282.
[33]
Yunfei Liu, Xingjun Ma, James Bailey, and Feng Lu. 2020. Reflection backdoor: A natural backdoor attack on deep neural networks. In ECCV. Springer, 182--199.
[34]
Yingqi Liu, Guangyu Shen, Guanhong Tao, Zhenting Wang, Shiqing Ma, and Xiangyu Zhang. 2022. Complex Backdoor Detection by Symmetric Feature Differencing. In Proc. CVPR. 15003--15013.
[35]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proc. ICCV.
[36]
Hua Ma, Yinshan Li, Yansong Gao, Alsharif Abuadbba, Zhi Zhang, Anmin Fu, Hyoungshick Kim, Said F Al-Sarawi, Nepal Surya, and Derek Abbott. 2022. Dangerous Cloaking: Natural Trigger based Backdoor Attacks on Object Detectors in the Physical World. arXiv preprint arXiv:2201.08619 (2022).
[37]
Hua Ma, Yinshan Li, Yansong Gao, Zhi Zhang, Alsharif Abuadbba, Anmin Fu, Said F Al-Sarawi, Surya Nepal, and Derek Abbott. 2023. TransCAB: Transferable Clean-Annotation Backdoor to Object Detection with Natural Trigger in Real-World. In SRDS. IEEE, 82--92.
[38]
Hua Ma, Huming Qiu, Yansong Gao, Zhi Zhang, Alsharif Abuadbba, Anmin Fu, Said Al-Sarawi, and Derek Abbott. 2023. Quantization backdoors to deep learning models. IEEE Transactions on Dependable and Secure Computing (2023).
[39]
Hua Ma, Shang Wang, Yansong Gao, Zhi Zhang, Huming Qiu, Minhui Xue, Abuadbba Alsharif, Anmin Fu, Surya Nepal, and Derek Abbott. 2023. Watch Out! Simple Horizontal Class Backdoor Can Trivially Evade Defense. arXiv preprint arXiv:2310.00542 (2023).
[40]
Wanlun Ma, Derui Wang, Ruoxi Sun, Minhui Xue, Sheng Wen, and Yang Xiang. 2023 d. The" Beatrix”Resurrections: Robust Backdoor Detection via Gram Matrices. In NDSS.
[41]
Anh Nguyen and Anh Tran. 2021. WaNet--Imperceptible Warping-based Backdoor Attack. In ICLR.
[42]
Tuan Anh Nguyen and Anh Tran. 2020. Input-aware dynamic backdoor attack. NIPS, Vol. 33 (2020), 3454--3464.
[43]
Alina Oprea and Apostol Vassilev. 2023. Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (Draft). Technical Report. National Institute of Standards and Technology.
[44]
Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).
[45]
Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. (2015). https://doi.org/10.5244/C.29.41
[46]
Huming Qiu, Hua Ma, Zhi Zhang, Alsharif Abuadbba, Wei Kang, Anmin Fu, and Yansong Gao. 2023. Towards a critical evaluation of robustness for deep learning backdoor countermeasures. IEEE Transactions on Information Forensics and Security (2023).
[47]
Erwin Quiring and Konrad Rieck. 2020. Backdooring and poisoning neural networks with image-scaling attacks. In IEEE security and privacy workshops (SPW). IEEE, 41--47.
[48]
Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. 2020. Hidden trigger backdoor attacks. In Proc. AAAI, Vol. 34. 11957--11965.
[49]
Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, and Yang Zhang. 2022. Dynamic backdoor attacks against machine learning models. In EuroS&P. IEEE, 703--718.
[50]
Giorgio Severi, Jim Meyer, Scott E Coull, and Alina Oprea. 2021. Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers. In USENIX Security Symp. 1487--1504.
[51]
Ali Shafahi, W Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. 2018. Poison Frogs! targeted clean-label poisoning attacks on neural networks. NIPS, Vol. 31 (2018), 6106--6116.
[52]
Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2012. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, Vol. 32 (2012), 323--332.
[53]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
[54]
Di Tang, XiaoFeng Wang, Haixu Tang, and Kehuan Zhang. 2021. Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection. In USENIX Security Symp. 1541--1558.
[55]
Guanhong Tao, Yingqi Liu, Guangyu Shen, Qiuling Xu, Shengwei An, Zhuo Zhang, and Xiangyu Zhang. 2022. Model orthogonalization: Class distance hardening in neural networks for better security. In S&P, Vol. 3.
[56]
Brandon Tran, Jerry Li, and Aleksander Madry. 2018. Spectral signatures in backdoor attacks. NIPS, Vol. 31 (2018).
[57]
Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. 2018. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data, Vol. 5, 1 (2018), 1--9.
[58]
Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, Vol. 575, 7782 (2019), 350--354.
[59]
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural Cleanse: Identifying and mitigating backdoor attacks in neural networks. In S&P. IEEE, 707--723.
[60]
Hang Wang, Zhen Xiang, David J Miller, and George Kesidis. 2024. MM-BD: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic. In S&P. IEEE Computer Society, 15--15.
[61]
Shang Wang, Yansong Gao, Anmin Fu, Zhi Zhang, Yuqing Zhang, and Willy Susilo. 2023. CASSOCK: Viable Backdoor Attacks against DNN in The Wall of Source-Specific Backdoor Defences. In AsiaCCS.
[62]
Zhenting Wang, Hailun Ding, Juan Zhai, and Shiqing Ma. 2022. Training with more confidence: Mitigating injected and natural backdoors during training. Advances in Neural Information Processing Systems, Vol. 35 (2022), 36396--36410.
[63]
Emily Wenger, Josephine Passananti, Arjun Nitin Bhagoji, Yuanshun Yao, Haitao Zheng, and Ben Y Zhao. 2021. Backdoor attacks against deep learning systems in the physical world. In Proc. CVPR. 6206--6215.
[64]
Tong Wu, Tianhao Wang, Vikash Sehwag, Saeed Mahloujifar, and Prateek Mittal. 2022. Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation. In Proc. AISec.
[65]
Qixue Xiao, Yufei Chen, Chao Shen, Yu Chen, and Kang Li. 2019. Seeing is not believing: Camouflage attacks on image scaling algorithms. In USENIX Security Symposium. 443--460.
[66]
Chulin Xie, Keli Huang, Pin-Yu Chen, and Bo Li. 2019. DBA: Distributed backdoor attacks against federated learning. In International conference on learning representations.
[67]
Xiaojun Xu, Qi Wang, Huichen Li, Nikita Borisov, Carl A Gunter, and Bo Li. 2021. Detecting AI trojans using meta neural analysis. In S&P. IEEE, 103--120.
[68]
Yuanshun Yao, Huiying Li, Haitao Zheng, and Ben Y Zhao. 2019. Latent backdoor attacks on deep neural networks. In Proc. CCS. 2041--2055.
[69]
Yi Zeng, Won Park, Z Morley Mao, and Ruoxi Jia. 2021. Rethinking the backdoor attacks' triggers: A frequency perspective. In Proc. ICCV. 16473--16481.
[70]
Rui Zhu, Di Tang, Siyuan Tang, XiaoFeng Wang, and Haixu Tang. 2022. Selective Amnesia: On Efficient, High-Fidelity and Blind Suppression of Backdoor Effects in Trojaned Machine Learning Models. In S&P. IEEE Computer Society, 1220--1238.

Index Terms

  1. Watch Out! Simple Horizontal Class Backdoor Can Trivially Evade Defense

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security
      December 2024
      5188 pages
      ISBN:9798400706363
      DOI:10.1145/3658644
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 December 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. backdoor attacks
      2. deep learning
      3. defenses

      Qualifiers

      • Research-article

      Funding Sources

      • CSIRO ? National Science Foundation (US) AI Research Collaboration Program

      Conference

      CCS '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 173
        Total Downloads
      • Downloads (Last 12 months)173
      • Downloads (Last 6 weeks)46
      Reflects downloads up to 14 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media