Abstract
Adversarial training (AT) aims to improve models’ robustness against adversarial attacks by mixing clean data and adversarial examples (AEs) into training. Most existing AT approaches can be grouped into restricted and unrestricted approaches. Restricted AT requires a prescribed uniform budget for AEs during training, with the obtained results showing high sensitivity to the budget. In contrast, unrestricted AT uses unconstrained AEs, and these overestimated AEs significantly lower the clean accuracy and robustness against small budget attacks. Thus, the existing AT approaches find it difficult to obtain a comprehensively robust model when confronting attacks with an unknown budget, which we name blind adversarial attacks. Considering this problem, this paper proposes a novel AT approach named blind adversarial training (BAT). The main idea is to use a cutoff-scale strategy to adaptively estimate a nonuniform budget to modify the AEs used in training, ensuring that the strengths of the AEs are dynamically located in a reasonable range and ultimately improving the comprehensive robustness of the AT model. We include a theoretical investigation on a toy classification problem to guarantee the improvement of BAT. The experimental results also demonstrate that BAT can achieve better comprehensive robustness than AT with several AEs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akhtar, N., Mian, A.: Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6, 14410–14430 (2018)
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, July 2018. https://arxiv.org/abs/1802.00420
Bhattad, A., Chong, M.J., Liang, K., Li, B., Forsyth, D.A.: Unrestricted adversarial examples via semantic manipulation. arXiv:1904.06347 (2020)
Brown, T.B., Carlini, N., Zhang, C., Olsson, C., Goodfellow, I.: Unrestricted adversarial examples. arXiv:1809.08352 (2018)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57, May 2017
Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. CoRR abs/2003.01690 (2020). https://arxiv.org/abs/2003.01690
Ding, G.W., Sharma, Y., Lui, K.Y.C., Huang, R.: MMA training: direct input space margin maximization through adversarial training. In: ICLR (2020)
Duan, R., Chen, Y., Niu, D., Yang, Y., Qin, A.K., He, Y.: Advdrop: adversarial attack to dnns by dropping information. CoRR abs/2108.09034 (2021), https://arxiv.org/abs/2108.09034
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083 (2017)
Mikolov, T., Deoras, A., Povey, D., Burget, L., Černocký, J.: Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 196–201 (2011)
Moosavi-Dezfooli, S., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: CVPR, pp. 2574–2582 (2016)
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597, May 2016
Papernot, N., et al.: Technical report on the cleverhans v2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768 (2018)
Rony, J., Hafemann, L.G., Oliveira, L.S., Ayed, I.B., Sabourin, R., Granger, E.: Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. arXiv:1811.09600 (2018)
Sankaranarayanan, S., Jain, A., Chellappa, R., Lim, S.N.: Regularizing deep networks using efficient layerwise adversarial training. In: arXiv preprint arXiv:1705.07819 (2017)
Song, C., He, K., Wang, L., Hopcroft, J.E.: Improving the generalization of adversarial training with domain adaptation. arXiv:1810.00740 (2018)
Song, Y., Shu, R., Kushman, N., Ermon, S.: Constructing unrestricted adversarial examples with generative models. arXiv:1805.07894 (2018)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)
Zhang, J., Jiang, X.: Adversarial examples: opportunities and challenges. arXiv preprint arXiv:1809.04790 (2018)
Zhang, L., Wang, X., Lu, K., Peng, S., Wang, X.: An efficient framework for generating robust adversarial examples. Int. J. Intell. Syst. 35(9), 1433–1449 (2020). https://doi.org/10.1002/int.22267, https://onlinelibrary.wiley.com/doi/abs/10.1002/int.22267
Zhao, Z., Liu, Z., Larson, M.A.: Towards large yet imperceptible adversarial image perturbations with perceptual color distance. CoRR abs/1911.02466 (2019), http://arxiv.org/abs/1911.02466
Acknowledgment
This work was supported by the National Natural Science Foundation of China (Grant No. 12004422) and by Beijing Nova Program of Science and Technology (Grant No. Z19110000111 9129).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xie, H., Xiang, X., Dong, B., Liu, N. (2024). Blind Adversarial Training: Towards Comprehensively Robust Models Against Blind Adversarial Attacks. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14474. Springer, Singapore. https://doi.org/10.1007/978-981-99-9119-8_2
Download citation
DOI: https://doi.org/10.1007/978-981-99-9119-8_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9118-1
Online ISBN: 978-981-99-9119-8
eBook Packages: Computer ScienceComputer Science (R0)