skip to main content
research-article

B3: Backdoor Attacks against Black-box Machine Learning Models

Published:08 August 2023Publication History
Skip Abstract Section

Abstract

Backdoor attacks aim to inject backdoors to victim machine learning models during training time, such that the backdoored model maintains the prediction power of the original model towards clean inputs and misbehaves towards backdoored inputs with the trigger. The reason for backdoor attacks is that resource-limited users usually download sophisticated models from model zoos or query the models from MLaaS rather than training a model from scratch, thus a malicious third party has a chance to provide a backdoored model. In general, the more precious the model provided (i.e., models trained on rare datasets), the more popular it is with users.

In this article, from a malicious model provider perspective, we propose a black-box backdoor attack, named B3, where neither the rare victim model (including the model architecture, parameters, and hyperparameters) nor the training data is available to the adversary. To facilitate backdoor attacks in the black-box scenario, we design a cost-effective model extraction method that leverages a carefully constructed query dataset to steal the functionality of the victim model with a limited budget. As the trigger is key to successful backdoor attacks, we develop a novel trigger generation algorithm that intensifies the bond between the trigger and the targeted misclassification label through the neuron with the highest impact on the targeted label. Extensive experiments have been conducted on various simulated deep learning models and the commercial API of Alibaba Cloud Compute Service. We demonstrate that B3 has a high attack success rate and maintains high prediction accuracy for benign inputs. It is also shown that B3 is robust against state-of-the-art defense strategies against backdoor attacks, such as model pruning and NC.

REFERENCES

  1. [1] Borgnia Eitan, Cherepanova Valeriia, Fowl Liam, Ghiasi Amin, Geiping Jonas, Goldblum Micah, Goldstein Tom, and Gupta Arjun. 2021. Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 38553859.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Chen Bryant, Carvalho Wilka, Baracaldo Nathalie, Ludwig Heiko, Edwards Benjamin, Lee Taesung, Molloy Ian, and Srivastava Biplav. 2018. Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728 (2018).Google ScholarGoogle Scholar
  3. [3] Chen Xinyun, Liu Chang, Li Bo, Lu Kimberly, and Song Dawn. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).Google ScholarGoogle Scholar
  4. [4] Chen Yanjiao, Gong Xueluan, Wang Qian, Di Xing, and Huang Huayang. 2020. Backdoor attacks and defenses for deep neural networks in outsourced cloud environments. IEEE Netw. 34, 5 (2020), 141147.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chen Yanjiao, Zheng Zhicong, and Gong Xueluan. 2022. MARNet: Backdoor attacks against cooperative multi-agent reinforcement learning. IEEE Trans. Depend. Secure Comput. (2022).Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Chou Edward, Tramèr Florian, Pellegrino Giancarlo, and Boneh Dan. 2018. SentiNet: Detecting physical attacks against deep learning systems. arXiv preprint arXiv:1812.00292 (2018).Google ScholarGoogle Scholar
  7. [7] Correia-Silva Jacson Rodrigues, Berriel Rodrigo F., Badue Claudine, Souza Alberto F. de, and Oliveira-Santos Thiago. 2018. Copycat CNN: Stealing knowledge by persuading confession with random non-labeled data. In International Joint Conference on Neural Networks. IEEE, 18.Google ScholarGoogle Scholar
  8. [8] Deng L.. 2012. The MNIST database of handwritten digit images for machine learning Research [Best of the Web]. IEEE Signal Process. Mag. 29, 6 (2012), 141142.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Fredrikson Matt, Jha Somesh, and Ristenpart Thomas. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In ACM SIGSAC Conference on Computer and Communications Security. 13221333.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Fredrikson Matthew, Lantz Eric, Jha Somesh, Lin Simon, Page David, and Ristenpart Thomas. 2014. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In USENIX Security Symposium. USENIX Association, 1732.Google ScholarGoogle Scholar
  11. [11] Gao Yansong, Xu Chang, Wang Derui, Chen Shiping, Ranasinghe Damith C., and Nepal Surya. 2019. STRIP: A defence against trojan attacks on deep neural networks. In IEEE Annual Computer Security Applications Conference. 113125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Gong Xueluan, Chen Yanjiao, Huang Huayang, Liao Yuqing, Wang Shuai, and Wang Qian. 2022. Coordinated backdoor attacks against federated learning with model-dependent triggers. IEEE Netw. 36, 1 (2022), 8490.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Gong Xueluan, Chen Yanjiao, Wang Qian, Huang Huayang, Meng Lingshuo, Shen Chao, and Zhang Qian. 2021. Defense-resistant backdoor attacks against deep neural networks in outsourced cloud environment. IEEE J. Select. Areas Commun. 39, 8 (2021), 26172631.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Gong Xueluan, Chen Yanjiao, Wang Qian, and Kong Weihan. 2022. Backdoor attacks and defenses in federated learning: State-of-the-art, taxonomy, and future directions. IEEE Wirel. Commun. 30, 2 (2022), 114121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Gong Xueluan, Chen Yanjiao, Yang Wang, Wang Qian, Gu Yuzhe, Huang Huayang, and Shen Chao. 2023. REDEEM MYSELF: Purifying backdoors in deep learning models using self attention distillation. In IEEE Symposium on Security and Privacy (SP’23). IEEE Computer Society, 755772.Google ScholarGoogle Scholar
  16. [16] Gong Xueluan, Wang Ziyao, Chen Yanjiao, Xue Meng, Wang Qian, and Shen Chao. 2023. Kaleidoscope: Physical backdoor attacks against deep neural networks with RGB filters. IEEE Trans. Depend. Secure Comput. (2023).Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Graham Lior and Yitzhaky Yitzhak. 2019. Blind restoration of space-variant Gaussian-like blurred images using regional PSFs. Signal, Image Vid. Process. 13, 4 (2019), 711717.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Gu Tianyu, Liu Kang, Dolan-Gavitt Brendan, and Garg Siddharth. 2019. BadNets: Evaluating backdooring attacks on deep neural networks. IEEE Access 7 (2019), 4723047244.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] H. Weik Martin2001. Additive White Gaussian Noise. Springer US.Google ScholarGoogle Scholar
  20. [20] Huang Gao, Liu Zhuang, Maaten Laurens van der, and Weinberger Kilian Q.. 2017. Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 22612269.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Ji Yujie, Zhang Xinyang, Ji Shouling, Luo Xiapu, and Wang Ting. 2018. Model-reuse attacks on deep learning systems. In ACM SIGSAC Conference on Computer and Communications Security. 349363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Ji Yujie, Zhang Xinyang, and Wang Ting. 2017. Backdoor attacks against learning systems. In IEEE Conference on Communications and Network Security. 19.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Juuti Mika, Szyller Sebastian, Marchal Samuel, and Asokan N.. 2019. PRADA: Protecting against DNN model stealing attacks. In IEEE European Symposium on Security and Privacy. 512527.Google ScholarGoogle Scholar
  24. [24] Kesarwani Manish, Mukhoty Bhaskar, Arya Vijay, and Mehta Sameep. 2018. Model extraction warning in MLaaS paradigm. In 34th Annual Computer Security Applications Conference. ACM, 371380.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Krizhevsky Alex, Hinton Geoffrey, and others. 2009. Learning multiple layers of features from tiny images. (2009).Google ScholarGoogle Scholar
  26. [26] LeCun Yann, Bottou Léon, Bengio Yoshua, and Haffner Patrick. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 22782324.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Li Haoliang, Wang Yufei, Xie Xiaofei, Liu Yang, Wang Shiqi, Wan Renjie, Chau Lap-Pui, and Kot Alex C.. 2020. Light can hack your face! Black-box backdoor attack on face recognition systems. arXiv preprint arXiv:2009.06996 (2020).Google ScholarGoogle Scholar
  28. [28] Li Shaofeng, Xue Minhui, Zhao Benjamin Zi Hao, Zhu Haojin, and Zhang Xinpeng. 2019. Invisible backdoor attacks on deep neural networks via steganography and regularization. arXiv preprint arXiv:1909.02742 (2019).Google ScholarGoogle Scholar
  29. [29] Li Yuanchun, Hua Jiayi, Wang Haoyu, Chen Chunyang, and Liu Yunxin. 2021. DeepPayload: Black-box backdoor attack on deep learning models through neural payload injection. In IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 263274.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Li Yige, Koren Nodens, Lyu Lingjuan, Lyu Xixiang, Li Bo, and Ma Xingjun. 2021. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In International Conference on Learning Representations. OpenReview.net.Google ScholarGoogle Scholar
  31. [31] Li Yiming, Zhai Tongqing, Wu Baoyuan, Jiang Yong, Li Zhifeng, and Xia Shutao. 2020. Rethinking the trigger of backdoor attack. arXiv preprint arXiv:2004.04692 (2020).Google ScholarGoogle Scholar
  32. [32] Liu Kang, Dolan-Gavitt Brendan, and Garg Siddharth. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 273294.Google ScholarGoogle Scholar
  33. [33] Liu Yingqi, Lee Wen-Chuan, Tao Guanhong, Ma Shiqing, Aafer Yousra, and Zhang Xiangyu. 2019. ABS: Scanning neural networks for backdoors by artificial brain stimulation. In ACM SIGSAC Conference on Computer and Communications Security. 12651282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Liu Yingqi, Ma Shiqing, Aafer Yousra, Lee Wen-Chuan, Zhai Juan, Wang Weihang, and Zhang Xiangyu. 2018. Trojaning attack on neural networks. In Annual Network and Distributed System Security Symposium. The Internet Society.Google ScholarGoogle Scholar
  35. [35] Ma Shiqing, Liu Yingqi, Tao Guanhong, Lee Wen-Chuan, and Zhang Xiangyu. 2019. NIC: Detecting adversarial samples with neural network invariant checking. In Annual Network and Distributed System Security Symposium. The Internet Society.Google ScholarGoogle Scholar
  36. [36] Morcos Ari S., Barrett David G. T., Rabinowitz Neil C., and Botvinick Matthew. 2018. On the importance of single directions for generalization. In 6th International Conference on Learning Representations. OpenReview.net.Google ScholarGoogle Scholar
  37. [37] Orekondy Tribhuvanesh, Schiele Bernt, and Fritz Mario. 2019. Knockoff nets: Stealing functionality of black-box models. In IEEE Conference on Computer Vision and Pattern Recognition. 49544963.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Pal Soham, Gupta Yash, Shukla Aditya, Kanade Aditya, Shevade Shirish, and Ganapathy Vinod. 2020. ACTIVETHIEF: Model extraction using active learning and unannotated public data. In AAAI Conference on Artificial Intelligence. AAAI Press, 865872.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Papernot Nicolas, McDaniel Patrick, Goodfellow Ian, Jha Somesh, Celik Z. Berkay, and Swami Ananthram. 2017. Practical black-box attacks against machine learning. In ACM Asia Conference on Computer and Communications Security. 506519.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Quiring Erwin, Arp Daniel, and Rieck Konrad. 2018. Forgotten siblings: Unifying attacks on machine learning and digital watermarking. In IEEE European Symposium on Security and Privacy. 488502.Google ScholarGoogle Scholar
  41. [41] Russakovsky Olga, Deng Jia, Su Hao, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng, Karpathy Andrej, Khosla Aditya, Bernstein Michael S., Berg Alexander C., and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Saha Aniruddha, Subramanya Akshayvarun, and Pirsiavash Hamed. 2020. Hidden trigger backdoor attacks. In AAAI Conference on Artificial Intelligence. AAAI Press, 1195711965.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Salem Ahmed, Wen Rui, Backes Michael, Ma Shiqing, and Zhang Yang. 2020. Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675 (2020).Google ScholarGoogle Scholar
  44. [44] Sener Ozan and Savarese Silvio. 2018. Active learning for convolutional neural networks: A core-set approach. In 6th International Conference on Learning Representations. OpenReview.net.Google ScholarGoogle Scholar
  45. [45] Stallkamp Johannes, Schlipsing Marc, Salmen Jan, and Igel Christian. 2012. Man vs. Computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32 (2012), 323332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Szegedy Christian, Zaremba Wojciech, Sutskever Ilya, Bruna Joan, Erhan Dumitru, Goodfellow Ian, and Fergus Rob. 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations. OpenReview.net.Google ScholarGoogle Scholar
  47. [47] Tramèr Florian, Zhang Fan, Juels Ari, Reiter Michael K., and Ristenpart Thomas. 2016. Stealing machine learning models via prediction APIs. In USENIX Security Symposium. USENIX Association, 601618.Google ScholarGoogle Scholar
  48. [48] Wang Bolun, Yao Yuanshun, Shan Shawn, Li Huiying, Viswanath Bimal, Zheng Haitao, and Zhao Ben Y.. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In IEEE Symposium on Security and Privacy. 707723.Google ScholarGoogle Scholar
  49. [49] Xiao Han, Rasul Kashif, and Vollgraf Roland. 2017. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv:cs.LG/1708.07747.Google ScholarGoogle Scholar
  50. [50] Yao Yuanshun, Li Huiying, Zheng Haitao, and Zhao Ben Y.. 2019. Latent backdoor attacks on deep neural networks. In ACM SIGSAC Conference on Computer and Communications Security. 20412055.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Yu Honggang, Yang Kaichen, Zhang Teng, Tsai Yun-Yun, Ho Tsung-Yi, and Jin Yier. 2020. CloudLeak: Large-scale deep learning models stealing through adversarial examples. In Network and Distributed Systems Security Symposium. The Internet Society.Google ScholarGoogle Scholar
  52. [52] Zagoruyko Sergey and Komodakis Nikos. 2016. Wide residual networks. In British Machine Vision Conference. BMVA Press.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. B3: Backdoor Attacks against Black-box Machine Learning Models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Privacy and Security
      ACM Transactions on Privacy and Security  Volume 26, Issue 4
      November 2023
      260 pages
      ISSN:2471-2566
      EISSN:2471-2574
      DOI:10.1145/3614236
      • Editor:
      • Ninghui Li
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 August 2023
      • Online AM: 22 June 2023
      • Accepted: 15 June 2023
      • Revised: 17 October 2022
      • Received: 20 December 2021
      Published in tops Volume 26, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text