Abstract
Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a \(0.4\%\) drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that because DNNs are trained using heuristic procedures, this is the case even if the third-party is benign.
- 2.
Defined as the fraction of backdoored test images classified as the target.
- 3.
While Gu et al. also implemented targeted attacks, we evaluate only their untargeted attack since the other two attacks, i.e., on face and speech recognition, are targeted.
- 4.
Since the goal of untargeted attacks is to reduce the accuracy on clean inputs, we define the attack success rate as \(1-\frac{A_{backdoor}}{A_{clean}}\), where \(A_{backdoor}\) is the accuracy on backdoored inputs and \(A_{clean}\) is the accuracy on clean inputs.
- 5.
Consistent with prior work, we say “pruning a neuron” to mean reducing the number of output channels in a layer by one.
References
ImageNet large scale visual recognition competition. http://www.image-net.org/challenges/LSVRC/2012/ (2012)
Amazon Web Services Inc: Amazon Elastic Compute Cloud (Amazon EC2)
Amazon.com, Inc.: Deep Learning AMI Amazon Linux Version
Anwar, S.: Structured pruning of deep convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 13(3), 32 (2017)
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, July 2018. https://arxiv.org/abs/1802.00420
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014)
Barreno, M., Nelson, B., Sears, R., Joseph, A.D., Tygar, J.D.: Can machine learning be secure? In: Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security. ASIACCS 2006 (2006). https://doi.org/10.1145/1128817.1128824
Blum, A., Rivest, R.L.: Training a 3-node neural network is NP-complete. In: Advances in neural information processing systems, pp. 494–501 (1989)
Carlini, N., Wagner, D.A.: Defensive distillation is not robust to adversarial examples. CoRR abs/1607.04311 (2016). http://arxiv.org/abs/1607.04311
Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. ArXiv e-prints, December 2017
Chung, S.P., Mok, A.K.: Allergy attack against automatic signature generation. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 61–80. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_4
Chung, S.P., Mok, A.K.: Advanced allergy attacks: does a corpus really help? In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 236–255. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74320-0_13
Dhillon, G.S., et al.: Stochastic activation pruning for robust adversarial defense. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=H1uR4GZRZ
Fogla, P., Lee, W.: Evading network anomaly detection systems: formal reasoning and practical techniques. In: Proceedings of the 13th ACM Conference on Computer and Communications Security. CCS 2006 (2006). https://doi.org/10.1145/1180405.1180414
Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: USENIX-SS 2006 Proceedings of the 15th Conference on USENIX Security Symposium, vol. 15 (2006)
Google Inc: Google Cloud Machine Learning Engine. https://cloud.google.com/ml-engine/
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)
Gu, T., Garg, S., Dolan-Gavitt, B.: BadNets: identifying vulnerabilities in the machine learning model supply chain. In: NIPS Machine Learning and Computer Security Workshop (2017). https://arxiv.org/abs/1708.06733
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. In: International Conference on Learning Representations (ICLR) (2016)
He, W., Wei, J., Chen, X., Carlini, N., Song, D.: Adversarial example defense: ensembles of weak defenses are not strong. In: 11th USENIX Workshop on Offensive Technologies (WOOT 2017). USENIX Association, Vancouver, BC (2017). https://www.usenix.org/conference/woot17/workshop-program/presentation/he
Hermann, K.M., Blunsom, P.: Multilingual distributed representations without word alignment. In: Proceedings of ICLR, April 2014. http://arxiv.org/abs/1312.6173
Iandola, F.N., Moskewicz, M.W., Ashraf, K., Keutzer, K.: FireCaffe: near-linear acceleration of deep neural network training on compute clusters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2592–2600 (2016)
Karlberger, C., Bayler, G., Kruegel, C., Kirda, E.: Exploiting redundancy in natural language to penetrate bayesian spam filters. In: Proceedings of the First USENIX Workshop on Offensive Technologies. WOOT 2007 (2007)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Li, H., et al.: Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016)
Liu, C., Li, B., Vorobeychik, Y., Oprea, A.: Robust linear regression against training data poisoning. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 91–102. ACM (2017)
Liu, Y., et al.: Trojaning attack on neural networks. In: 25nd Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, 18–21 February 2018. The Internet Society (2018)
Liu, Y., Xie, Y., Srivastava, A.: Neural trojans. CoRR abs/1710.00942 (2017). http://arxiv.org/abs/1710.00942
Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. KDD 2005, pp. 641–647. ACM, New York (2005). https://doi.org/10.1145/1081870.1081950
Lowd, D., Meek, C.: Good word attacks on statistical spam filters. In: Proceedings of the Conference on Email and Anti-Spam (CEAS) (2005)
Microsoft Corporation: Azure Batch AI Training. https://batchaitraining.azure.com/
Møgelmose, A., Liu, D., Trivedi, M.M.: Traffic sign detection for us roads: remaining challenges and a case for tracking. In: 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), pp. 1394–1399. IEEE (2014)
Molchanov, P., et al.: Pruning convolutional neural networks for resource efficient inference (2016)
Muñoz-González, L., et al.: Towards poisoning of deep learning algorithms with back-gradient optimization. CoRR abs/1708.08689 (2017). http://arxiv.org/abs/1708.08689
Nelson, B., et al.: Exploiting machine learning to subvert your spam filter. In: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats. LEET 2008, pp. 7:1–7:9. USENIX Association, Berkeley (2008)
Newsome, J., Karp, B., Song, D.: Paragraph: thwarting signature learning by training maliciously. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 81–105. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_5
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597, May 2016. https://doi.org/10.1109/SP.2016.41
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Suciu, O., Marginean, R., Kaya, Y., Daumé III, H., Dumitras, T.: When does machine learning FAIL? Generalized transferability for evasion and poisoning attacks. In: 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore (2018). https://www.usenix.org/conference/usenixsecurity18/presentation/suciu
Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898 (2014)
Tan, K.M.C., Killourhy, K.S., Maxion, R.A.: Undermining an anomaly-based intrusion detection system using common exploits. In: Proceedings of the 5th International Conference on Recent Advances in Intrusion Detection. RAID 2002 (2002)
Tung, F., Muralidharan, S., Mori, G.: Fine-pruning: joint fine-tuning and compression of a convolutional network with Bayesian optimization. In: British Machine Vision Conference (BMVC) (2017)
Wagner, D., Soto, P.: Mimicry attacks on host-based intrusion detection systems. In: Proceedings of the 9th ACM Conference on Computer and Communications Security. CCS 2002 (2002). https://doi.org/10.1145/586110.586145
Wittel, G.L., Wu, S.F.: On attacking statistical spam filters. In: Proceedings of the Conference on Email and Anti-Spam (CEAS), Mountain View, CA, USA (2004)
Wolf, L., Hassner, T., Maoz, I.: Face recognition in unconstrained videos with matched background similarity. In: CVPR 2011, pp. 529–534, June 2011. https://doi.org/10.1109/CVPR.2011.5995566
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747 (2017). http://arxiv.org/abs/1708.07747
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)
Yu, J., et al.: Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 548–560. ACM (2017)
Acknowledgement
This research was partially supported by National Science Foundation CAREER Award #1553419.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, K., Dolan-Gavitt, B., Garg, S. (2018). Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds) Research in Attacks, Intrusions, and Defenses. RAID 2018. Lecture Notes in Computer Science(), vol 11050. Springer, Cham. https://doi.org/10.1007/978-3-030-00470-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-00470-5_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00469-9
Online ISBN: 978-3-030-00470-5
eBook Packages: Computer ScienceComputer Science (R0)