research-article

B³: Backdoor Attacks against Black-box Machine Learning Models

Authors:

Qian WangAuthors Info & Claims

ACM Transactions on Privacy and Security, Volume 26, Issue 4

Article No.: 43, Pages 1 - 24

https://doi.org/10.1145/3605212

Published: 08 August 2023 Publication History

Get Access

Abstract

Backdoor attacks aim to inject backdoors to victim machine learning models during training time, such that the backdoored model maintains the prediction power of the original model towards clean inputs and misbehaves towards backdoored inputs with the trigger. The reason for backdoor attacks is that resource-limited users usually download sophisticated models from model zoos or query the models from MLaaS rather than training a model from scratch, thus a malicious third party has a chance to provide a backdoored model. In general, the more precious the model provided (i.e., models trained on rare datasets), the more popular it is with users.

In this article, from a malicious model provider perspective, we propose a black-box backdoor attack, named B³, where neither the rare victim model (including the model architecture, parameters, and hyperparameters) nor the training data is available to the adversary. To facilitate backdoor attacks in the black-box scenario, we design a cost-effective model extraction method that leverages a carefully constructed query dataset to steal the functionality of the victim model with a limited budget. As the trigger is key to successful backdoor attacks, we develop a novel trigger generation algorithm that intensifies the bond between the trigger and the targeted misclassification label through the neuron with the highest impact on the targeted label. Extensive experiments have been conducted on various simulated deep learning models and the commercial API of Alibaba Cloud Compute Service. We demonstrate that B³ has a high attack success rate and maintains high prediction accuracy for benign inputs. It is also shown that B³ is robust against state-of-the-art defense strategies against backdoor attacks, such as model pruning and NC.

References

[1]

Eitan Borgnia, Valeriia Cherepanova, Liam Fowl, Amin Ghiasi, Jonas Geiping, Micah Goldblum, Tom Goldstein, and Arjun Gupta. 2021. Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3855–3859.

Abstract

References

Cited By

Index Terms

Recommendations

Disabling Backdoor and Identifying Poison Data by using Knowledge Distillation in Backdoor Attacks on Deep Neural Networks

AdvMind: Inferring Adversary Intent of Black-Box Attacks

Backdoor Attacks and Defenses Targeting Multi-Domain AI Models: A Comprehensive Review

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

Share

Share this Publication link

Share on social media

Affiliations