skip to main content
research-article

B3: Backdoor Attacks against Black-box Machine Learning Models

Published: 08 August 2023 Publication History

Abstract

Backdoor attacks aim to inject backdoors to victim machine learning models during training time, such that the backdoored model maintains the prediction power of the original model towards clean inputs and misbehaves towards backdoored inputs with the trigger. The reason for backdoor attacks is that resource-limited users usually download sophisticated models from model zoos or query the models from MLaaS rather than training a model from scratch, thus a malicious third party has a chance to provide a backdoored model. In general, the more precious the model provided (i.e., models trained on rare datasets), the more popular it is with users.
In this article, from a malicious model provider perspective, we propose a black-box backdoor attack, named B3, where neither the rare victim model (including the model architecture, parameters, and hyperparameters) nor the training data is available to the adversary. To facilitate backdoor attacks in the black-box scenario, we design a cost-effective model extraction method that leverages a carefully constructed query dataset to steal the functionality of the victim model with a limited budget. As the trigger is key to successful backdoor attacks, we develop a novel trigger generation algorithm that intensifies the bond between the trigger and the targeted misclassification label through the neuron with the highest impact on the targeted label. Extensive experiments have been conducted on various simulated deep learning models and the commercial API of Alibaba Cloud Compute Service. We demonstrate that B3 has a high attack success rate and maintains high prediction accuracy for benign inputs. It is also shown that B3 is robust against state-of-the-art defense strategies against backdoor attacks, such as model pruning and NC.

References

[1]
Eitan Borgnia, Valeriia Cherepanova, Liam Fowl, Amin Ghiasi, Jonas Geiping, Micah Goldblum, Tom Goldstein, and Arjun Gupta. 2021. Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3855–3859.
[2]
Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava. 2018. Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728 (2018).
[3]
Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).
[4]
Yanjiao Chen, Xueluan Gong, Qian Wang, Xing Di, and Huayang Huang. 2020. Backdoor attacks and defenses for deep neural networks in outsourced cloud environments. IEEE Netw. 34, 5 (2020), 141–147.
[5]
Yanjiao Chen, Zhicong Zheng, and Xueluan Gong. 2022. MARNet: Backdoor attacks against cooperative multi-agent reinforcement learning. IEEE Trans. Depend. Secure Comput. (2022).
[6]
Edward Chou, Florian Tramèr, Giancarlo Pellegrino, and Dan Boneh. 2018. SentiNet: Detecting physical attacks against deep learning systems. arXiv preprint arXiv:1812.00292 (2018).
[7]
Jacson Rodrigues Correia-Silva, Rodrigo F. Berriel, Claudine Badue, Alberto F. de Souza, and Thiago Oliveira-Santos. 2018. Copycat CNN: Stealing knowledge by persuading confession with random non-labeled data. In International Joint Conference on Neural Networks. IEEE, 1–8.
[8]
L. Deng. 2012. The MNIST database of handwritten digit images for machine learning Research [Best of the Web]. IEEE Signal Process. Mag. 29, 6 (2012), 141–142.
[9]
Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In ACM SIGSAC Conference on Computer and Communications Security. 1322–1333.
[10]
Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2014. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In USENIX Security Symposium. USENIX Association, 17–32.
[11]
Yansong Gao, Chang Xu, Derui Wang, Shiping Chen, Damith C. Ranasinghe, and Surya Nepal. 2019. STRIP: A defence against trojan attacks on deep neural networks. In IEEE Annual Computer Security Applications Conference. 113–125.
[12]
Xueluan Gong, Yanjiao Chen, Huayang Huang, Yuqing Liao, Shuai Wang, and Qian Wang. 2022. Coordinated backdoor attacks against federated learning with model-dependent triggers. IEEE Netw. 36, 1 (2022), 84–90.
[13]
Xueluan Gong, Yanjiao Chen, Qian Wang, Huayang Huang, Lingshuo Meng, Chao Shen, and Qian Zhang. 2021. Defense-resistant backdoor attacks against deep neural networks in outsourced cloud environment. IEEE J. Select. Areas Commun. 39, 8 (2021), 2617–2631.
[14]
Xueluan Gong, Yanjiao Chen, Qian Wang, and Weihan Kong. 2022. Backdoor attacks and defenses in federated learning: State-of-the-art, taxonomy, and future directions. IEEE Wirel. Commun. 30, 2 (2022), 114–121.
[15]
Xueluan Gong, Yanjiao Chen, Wang Yang, Qian Wang, Yuzhe Gu, Huayang Huang, and Chao Shen. 2023. REDEEM MYSELF: Purifying backdoors in deep learning models using self attention distillation. In IEEE Symposium on Security and Privacy (SP’23). IEEE Computer Society, 755–772.
[16]
Xueluan Gong, Ziyao Wang, Yanjiao Chen, Meng Xue, Qian Wang, and Chao Shen. 2023. Kaleidoscope: Physical backdoor attacks against deep neural networks with RGB filters. IEEE Trans. Depend. Secure Comput. (2023).
[17]
Lior Graham and Yitzhak Yitzhaky. 2019. Blind restoration of space-variant Gaussian-like blurred images using regional PSFs. Signal, Image Vid. Process. 13, 4 (2019), 711–717.
[18]
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. BadNets: Evaluating backdooring attacks on deep neural networks. IEEE Access 7 (2019), 47230–47244.
[19]
Weik Martin H.2001. Additive White Gaussian Noise. Springer US.
[20]
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2261–2269.
[21]
Yujie Ji, Xinyang Zhang, Shouling Ji, Xiapu Luo, and Ting Wang. 2018. Model-reuse attacks on deep learning systems. In ACM SIGSAC Conference on Computer and Communications Security. 349–363.
[22]
Yujie Ji, Xinyang Zhang, and Ting Wang. 2017. Backdoor attacks against learning systems. In IEEE Conference on Communications and Network Security. 1–9.
[23]
Mika Juuti, Sebastian Szyller, Samuel Marchal, and N. Asokan. 2019. PRADA: Protecting against DNN model stealing attacks. In IEEE European Symposium on Security and Privacy. 512–527.
[24]
Manish Kesarwani, Bhaskar Mukhoty, Vijay Arya, and Sameep Mehta. 2018. Model extraction warning in MLaaS paradigm. In 34th Annual Computer Security Applications Conference. ACM, 371–380.
[25]
Alex Krizhevsky, Geoffrey Hinton, and others. 2009. Learning multiple layers of features from tiny images. (2009).
[26]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
[27]
Haoliang Li, Yufei Wang, Xiaofei Xie, Yang Liu, Shiqi Wang, Renjie Wan, Lap-Pui Chau, and Alex C. Kot. 2020. Light can hack your face! Black-box backdoor attack on face recognition systems. arXiv preprint arXiv:2009.06996 (2020).
[28]
Shaofeng Li, Minhui Xue, Benjamin Zi Hao Zhao, Haojin Zhu, and Xinpeng Zhang. 2019. Invisible backdoor attacks on deep neural networks via steganography and regularization. arXiv preprint arXiv:1909.02742 (2019).
[29]
Yuanchun Li, Jiayi Hua, Haoyu Wang, Chunyang Chen, and Yunxin Liu. 2021. DeepPayload: Black-box backdoor attack on deep learning models through neural payload injection. In IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 263–274.
[30]
Yige Li, Nodens Koren, Lingjuan Lyu, Xixiang Lyu, Bo Li, and Xingjun Ma. 2021. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In International Conference on Learning Representations. OpenReview.net.
[31]
Yiming Li, Tongqing Zhai, Baoyuan Wu, Yong Jiang, Zhifeng Li, and Shutao Xia. 2020. Rethinking the trigger of backdoor attack. arXiv preprint arXiv:2004.04692 (2020).
[32]
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 273–294.
[33]
Yingqi Liu, Wen-Chuan Lee, Guanhong Tao, Shiqing Ma, Yousra Aafer, and Xiangyu Zhang. 2019. ABS: Scanning neural networks for backdoors by artificial brain stimulation. In ACM SIGSAC Conference on Computer and Communications Security. 1265–1282.
[34]
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018. Trojaning attack on neural networks. In Annual Network and Distributed System Security Symposium. The Internet Society.
[35]
Shiqing Ma, Yingqi Liu, Guanhong Tao, Wen-Chuan Lee, and Xiangyu Zhang. 2019. NIC: Detecting adversarial samples with neural network invariant checking. In Annual Network and Distributed System Security Symposium. The Internet Society.
[36]
Ari S. Morcos, David G. T. Barrett, Neil C. Rabinowitz, and Matthew Botvinick. 2018. On the importance of single directions for generalization. In 6th International Conference on Learning Representations. OpenReview.net.
[37]
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2019. Knockoff nets: Stealing functionality of black-box models. In IEEE Conference on Computer Vision and Pattern Recognition. 4954–4963.
[38]
Soham Pal, Yash Gupta, Aditya Shukla, Aditya Kanade, Shirish Shevade, and Vinod Ganapathy. 2020. ACTIVETHIEF: Model extraction using active learning and unannotated public data. In AAAI Conference on Artificial Intelligence. AAAI Press, 865–872.
[39]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In ACM Asia Conference on Computer and Communications Security. 506–519.
[40]
Erwin Quiring, Daniel Arp, and Konrad Rieck. 2018. Forgotten siblings: Unifying attacks on machine learning and digital watermarking. In IEEE European Symposium on Security and Privacy. 488–502.
[41]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211–252.
[42]
Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. 2020. Hidden trigger backdoor attacks. In AAAI Conference on Artificial Intelligence. AAAI Press, 11957–11965.
[43]
Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, and Yang Zhang. 2020. Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675 (2020).
[44]
Ozan Sener and Silvio Savarese. 2018. Active learning for convolutional neural networks: A core-set approach. In 6th International Conference on Learning Representations. OpenReview.net.
[45]
Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2012. Man vs. Computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32 (2012), 323–332.
[46]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations. OpenReview.net.
[47]
Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction APIs. In USENIX Security Symposium. USENIX Association, 601–618.
[48]
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In IEEE Symposium on Security and Privacy. 707–723.
[49]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv:cs.LG/1708.07747.
[50]
Yuanshun Yao, Huiying Li, Haitao Zheng, and Ben Y. Zhao. 2019. Latent backdoor attacks on deep neural networks. In ACM SIGSAC Conference on Computer and Communications Security. 2041–2055.
[51]
Honggang Yu, Kaichen Yang, Teng Zhang, Yun-Yun Tsai, Tsung-Yi Ho, and Yier Jin. 2020. CloudLeak: Large-scale deep learning models stealing through adversarial examples. In Network and Distributed Systems Security Symposium. The Internet Society.
[52]
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. In British Machine Vision Conference. BMVA Press.

Cited By

View all
  • (2025)Attention-based backdoor attacks against natural language processing modelsApplied Soft Computing10.1016/j.asoc.2025.112907173(112907)Online publication date: Apr-2025
  • (2024)Machine learning security and privacy: a review of threats and countermeasuresEURASIP Journal on Information Security10.1186/s13635-024-00158-32024:1Online publication date: 23-Apr-2024
  • (2024)RF Domain Backdoor Attack on Signal Classification via Stealthy TriggerIEEE Transactions on Mobile Computing10.1109/TMC.2024.340434123:12(11765-11780)Online publication date: 1-Dec-2024
  • Show More Cited By

Index Terms

  1. B3: Backdoor Attacks against Black-box Machine Learning Models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Privacy and Security
    ACM Transactions on Privacy and Security  Volume 26, Issue 4
    November 2023
    260 pages
    ISSN:2471-2566
    EISSN:2471-2574
    DOI:10.1145/3614236
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 August 2023
    Online AM: 22 June 2023
    Accepted: 15 June 2023
    Revised: 17 October 2022
    Received: 20 December 2021
    Published in TOPS Volume 26, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Backdoor attacks
    2. black-box
    3. machine learning models
    4. model extraction attacks

    Qualifiers

    • Research-article

    Funding Sources

    • National Key R&D Program of China
    • NSFC

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)418
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Attention-based backdoor attacks against natural language processing modelsApplied Soft Computing10.1016/j.asoc.2025.112907173(112907)Online publication date: Apr-2025
    • (2024)Machine learning security and privacy: a review of threats and countermeasuresEURASIP Journal on Information Security10.1186/s13635-024-00158-32024:1Online publication date: 23-Apr-2024
    • (2024)RF Domain Backdoor Attack on Signal Classification via Stealthy TriggerIEEE Transactions on Mobile Computing10.1109/TMC.2024.340434123:12(11765-11780)Online publication date: 1-Dec-2024
    • (2024)Palette: Physically-Realizable Backdoor Attacks Against Video Recognition ModelsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.331479221:4(2672-2685)Online publication date: 1-Jul-2024
    • (2024)M-to-N Backdoor Paradigm: A Multi-Trigger and Multi-Target Attack to Deep Learning ModelsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341741034:11_Part_1(11299-11312)Online publication date: 21-Jun-2024
    • (2024)SGBAComputers and Security10.1016/j.cose.2023.103523136:COnline publication date: 1-Feb-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media