Abstract
Backdoor attacks aim to inject backdoors to victim machine learning models during training time, such that the backdoored model maintains the prediction power of the original model towards clean inputs and misbehaves towards backdoored inputs with the trigger. The reason for backdoor attacks is that resource-limited users usually download sophisticated models from model zoos or query the models from MLaaS rather than training a model from scratch, thus a malicious third party has a chance to provide a backdoored model. In general, the more precious the model provided (i.e., models trained on rare datasets), the more popular it is with users.
In this article, from a malicious model provider perspective, we propose a black-box backdoor attack, named B3, where neither the rare victim model (including the model architecture, parameters, and hyperparameters) nor the training data is available to the adversary. To facilitate backdoor attacks in the black-box scenario, we design a cost-effective model extraction method that leverages a carefully constructed query dataset to steal the functionality of the victim model with a limited budget. As the trigger is key to successful backdoor attacks, we develop a novel trigger generation algorithm that intensifies the bond between the trigger and the targeted misclassification label through the neuron with the highest impact on the targeted label. Extensive experiments have been conducted on various simulated deep learning models and the commercial API of Alibaba Cloud Compute Service. We demonstrate that B3 has a high attack success rate and maintains high prediction accuracy for benign inputs. It is also shown that B3 is robust against state-of-the-art defense strategies against backdoor attacks, such as model pruning and NC.
- [1] . 2021. Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3855–3859.Google ScholarCross Ref
- [2] . 2018. Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728 (2018).Google Scholar
- [3] . 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).Google Scholar
- [4] . 2020. Backdoor attacks and defenses for deep neural networks in outsourced cloud environments. IEEE Netw. 34, 5 (2020), 141–147.Google ScholarCross Ref
- [5] . 2022. MARNet: Backdoor attacks against cooperative multi-agent reinforcement learning. IEEE Trans. Depend. Secure Comput. (2022).Google ScholarCross Ref
- [6] . 2018. SentiNet: Detecting physical attacks against deep learning systems. arXiv preprint arXiv:1812.00292 (2018).Google Scholar
- [7] . 2018. Copycat CNN: Stealing knowledge by persuading confession with random non-labeled data. In International Joint Conference on Neural Networks. IEEE, 1–8.Google Scholar
- [8] . 2012. The MNIST database of handwritten digit images for machine learning Research [Best of the Web]. IEEE Signal Process. Mag. 29, 6 (2012), 141–142.Google ScholarCross Ref
- [9] . 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In ACM SIGSAC Conference on Computer and Communications Security. 1322–1333.Google ScholarDigital Library
- [10] . 2014. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In USENIX Security Symposium. USENIX Association, 17–32.Google Scholar
- [11] . 2019. STRIP: A defence against trojan attacks on deep neural networks. In IEEE Annual Computer Security Applications Conference. 113–125.Google ScholarDigital Library
- [12] . 2022. Coordinated backdoor attacks against federated learning with model-dependent triggers. IEEE Netw. 36, 1 (2022), 84–90.Google ScholarCross Ref
- [13] . 2021. Defense-resistant backdoor attacks against deep neural networks in outsourced cloud environment. IEEE J. Select. Areas Commun. 39, 8 (2021), 2617–2631.Google ScholarCross Ref
- [14] . 2022. Backdoor attacks and defenses in federated learning: State-of-the-art, taxonomy, and future directions. IEEE Wirel. Commun. 30, 2 (2022), 114–121.Google ScholarDigital Library
- [15] . 2023. REDEEM MYSELF: Purifying backdoors in deep learning models using self attention distillation. In IEEE Symposium on Security and Privacy (SP’23). IEEE Computer Society, 755–772.Google Scholar
- [16] . 2023. Kaleidoscope: Physical backdoor attacks against deep neural networks with RGB filters. IEEE Trans. Depend. Secure Comput. (2023).Google ScholarDigital Library
- [17] . 2019. Blind restoration of space-variant Gaussian-like blurred images using regional PSFs. Signal, Image Vid. Process. 13, 4 (2019), 711–717.Google ScholarCross Ref
- [18] . 2019. BadNets: Evaluating backdooring attacks on deep neural networks. IEEE Access 7 (2019), 47230–47244.Google ScholarCross Ref
- [19] 2001. Additive White Gaussian Noise. Springer US.Google Scholar
- [20] . 2017. Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2261–2269.Google ScholarCross Ref
- [21] . 2018. Model-reuse attacks on deep learning systems. In ACM SIGSAC Conference on Computer and Communications Security. 349–363.Google ScholarDigital Library
- [22] . 2017. Backdoor attacks against learning systems. In IEEE Conference on Communications and Network Security. 1–9.Google ScholarCross Ref
- [23] . 2019. PRADA: Protecting against DNN model stealing attacks. In IEEE European Symposium on Security and Privacy. 512–527.Google Scholar
- [24] . 2018. Model extraction warning in MLaaS paradigm. In 34th Annual Computer Security Applications Conference. ACM, 371–380.Google ScholarDigital Library
- [25] , and others. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- [26] . 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google ScholarCross Ref
- [27] . 2020. Light can hack your face! Black-box backdoor attack on face recognition systems. arXiv preprint arXiv:2009.06996 (2020).Google Scholar
- [28] . 2019. Invisible backdoor attacks on deep neural networks via steganography and regularization. arXiv preprint arXiv:1909.02742 (2019).Google Scholar
- [29] . 2021. DeepPayload: Black-box backdoor attack on deep learning models through neural payload injection. In IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 263–274.Google ScholarDigital Library
- [30] . 2021. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In International Conference on Learning Representations. OpenReview.net.Google Scholar
- [31] . 2020. Rethinking the trigger of backdoor attack. arXiv preprint arXiv:2004.04692 (2020).Google Scholar
- [32] . 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 273–294.Google Scholar
- [33] . 2019. ABS: Scanning neural networks for backdoors by artificial brain stimulation. In ACM SIGSAC Conference on Computer and Communications Security. 1265–1282.Google ScholarDigital Library
- [34] . 2018. Trojaning attack on neural networks. In Annual Network and Distributed System Security Symposium. The Internet Society.Google Scholar
- [35] . 2019. NIC: Detecting adversarial samples with neural network invariant checking. In Annual Network and Distributed System Security Symposium. The Internet Society.Google Scholar
- [36] . 2018. On the importance of single directions for generalization. In 6th International Conference on Learning Representations. OpenReview.net.Google Scholar
- [37] . 2019. Knockoff nets: Stealing functionality of black-box models. In IEEE Conference on Computer Vision and Pattern Recognition. 4954–4963.Google ScholarCross Ref
- [38] . 2020. ACTIVETHIEF: Model extraction using active learning and unannotated public data. In AAAI Conference on Artificial Intelligence. AAAI Press, 865–872.Google ScholarCross Ref
- [39] . 2017. Practical black-box attacks against machine learning. In ACM Asia Conference on Computer and Communications Security. 506–519.Google ScholarDigital Library
- [40] . 2018. Forgotten siblings: Unifying attacks on machine learning and digital watermarking. In IEEE European Symposium on Security and Privacy. 488–502.Google Scholar
- [41] . 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211–252.Google ScholarDigital Library
- [42] . 2020. Hidden trigger backdoor attacks. In AAAI Conference on Artificial Intelligence. AAAI Press, 11957–11965.Google ScholarCross Ref
- [43] . 2020. Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675 (2020).Google Scholar
- [44] . 2018. Active learning for convolutional neural networks: A core-set approach. In 6th International Conference on Learning Representations. OpenReview.net.Google Scholar
- [45] . 2012. Man vs. Computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32 (2012), 323–332.Google ScholarDigital Library
- [46] . 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations. OpenReview.net.Google Scholar
- [47] . 2016. Stealing machine learning models via prediction APIs. In USENIX Security Symposium. USENIX Association, 601–618.Google Scholar
- [48] . 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In IEEE Symposium on Security and Privacy. 707–723.Google Scholar
- [49] . 2017. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms.
arXiv:cs.LG/1708.07747. Google Scholar - [50] . 2019. Latent backdoor attacks on deep neural networks. In ACM SIGSAC Conference on Computer and Communications Security. 2041–2055.Google ScholarDigital Library
- [51] . 2020. CloudLeak: Large-scale deep learning models stealing through adversarial examples. In Network and Distributed Systems Security Symposium. The Internet Society.Google Scholar
- [52] . 2016. Wide residual networks. In British Machine Vision Conference. BMVA Press.Google ScholarCross Ref
Index Terms
- B3: Backdoor Attacks against Black-box Machine Learning Models
Recommendations
Disabling Backdoor and Identifying Poison Data by using Knowledge Distillation in Backdoor Attacks on Deep Neural Networks
AISec'20: Proceedings of the 13th ACM Workshop on Artificial Intelligence and SecurityBackdoor attacks are poisoning attacks and serious threats to deep neural networks. When an adversary mixes poison data into a training dataset, the training dataset is called a poison training dataset. A model trained with the poison training dataset ...
AdvMind: Inferring Adversary Intent of Black-Box Attacks
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningDeep neural networks (DNNs) are inherently susceptible to adversarial attacks even under black-box settings, in which the adversary only has query access to the target models. In practice, while it may be possible to effectively detect such attacks (...
MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems
ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and NetworkingSpeaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MasterKey, to compromise the SV models. Different from previous attacks, we focus ...
Comments