Blind Adversarial Training: Towards Comprehensively Robust Models Against Blind Adversarial Attacks

Xie, Haidong; Xiang, Xueshuang; Dong, Bin; Liu, Naijin

doi:10.1007/978-981-99-9119-8_2

Haidong Xie ORCID: orcid.org/0000-0002-2434-6300¹¹,
Xueshuang Xiang¹²,
Bin Dong^13,14,15 &
…
Naijin Liu¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14474))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

995 Accesses

Abstract

Adversarial training (AT) aims to improve models’ robustness against adversarial attacks by mixing clean data and adversarial examples (AEs) into training. Most existing AT approaches can be grouped into restricted and unrestricted approaches. Restricted AT requires a prescribed uniform budget for AEs during training, with the obtained results showing high sensitivity to the budget. In contrast, unrestricted AT uses unconstrained AEs, and these overestimated AEs significantly lower the clean accuracy and robustness against small budget attacks. Thus, the existing AT approaches find it difficult to obtain a comprehensively robust model when confronting attacks with an unknown budget, which we name blind adversarial attacks. Considering this problem, this paper proposes a novel AT approach named blind adversarial training (BAT). The main idea is to use a cutoff-scale strategy to adaptively estimate a nonuniform budget to modify the AEs used in training, ensuring that the strengths of the AEs are dynamically located in a reasonable range and ultimately improving the comprehensive robustness of the AT model. We include a theoretical investigation on a toy classification problem to guarantee the improvement of BAT. The experimental results also demonstrate that BAT can achieve better comprehensive robustness than AT with several AEs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective

Gray-Box Adversarial Training

Using the Strongest Adversarial Example to Alleviate Robust Overfitting

References

Akhtar, N., Mian, A.: Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6, 14410–14430 (2018)
Article Google Scholar
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, July 2018. https://arxiv.org/abs/1802.00420
Bhattad, A., Chong, M.J., Liang, K., Li, B., Forsyth, D.A.: Unrestricted adversarial examples via semantic manipulation. arXiv:1904.06347 (2020)
Brown, T.B., Carlini, N., Zhang, C., Olsson, C., Goodfellow, I.: Unrestricted adversarial examples. arXiv:1809.08352 (2018)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57, May 2017
Google Scholar
Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. CoRR abs/2003.01690 (2020). https://arxiv.org/abs/2003.01690
Ding, G.W., Sharma, Y., Lui, K.Y.C., Huang, R.: MMA training: direct input space margin maximization through adversarial training. In: ICLR (2020)
Google Scholar
Duan, R., Chen, Y., Niu, D., Yang, Y., Qin, A.K., He, Y.: Advdrop: adversarial attack to dnns by dropping information. CoRR abs/2108.09034 (2021), https://arxiv.org/abs/2108.09034
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)
Google Scholar
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083 (2017)
Mikolov, T., Deoras, A., Povey, D., Burget, L., Černocký, J.: Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 196–201 (2011)
Google Scholar
Moosavi-Dezfooli, S., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: CVPR, pp. 2574–2582 (2016)
Google Scholar
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597, May 2016
Google Scholar
Papernot, N., et al.: Technical report on the cleverhans v2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768 (2018)
Rony, J., Hafemann, L.G., Oliveira, L.S., Ayed, I.B., Sabourin, R., Granger, E.: Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. arXiv:1811.09600 (2018)
Sankaranarayanan, S., Jain, A., Chellappa, R., Lim, S.N.: Regularizing deep networks using efficient layerwise adversarial training. In: arXiv preprint arXiv:1705.07819 (2017)
Song, C., He, K., Wang, L., Hopcroft, J.E.: Improving the generalization of adversarial training with domain adaptation. arXiv:1810.00740 (2018)
Song, Y., Shu, R., Kushman, N., Ermon, S.: Constructing unrestricted adversarial examples with generative models. arXiv:1805.07894 (2018)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Google Scholar
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)
Zhang, J., Jiang, X.: Adversarial examples: opportunities and challenges. arXiv preprint arXiv:1809.04790 (2018)
Zhang, L., Wang, X., Lu, K., Peng, S., Wang, X.: An efficient framework for generating robust adversarial examples. Int. J. Intell. Syst. 35(9), 1433–1449 (2020). https://doi.org/10.1002/int.22267, https://onlinelibrary.wiley.com/doi/abs/10.1002/int.22267
Zhao, Z., Liu, Z., Larson, M.A.: Towards large yet imperceptible adversarial image perturbations with perceptual color distance. CoRR abs/1911.02466 (2019), http://arxiv.org/abs/1911.02466

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Grant No. 12004422) and by Beijing Nova Program of Science and Technology (Grant No. Z19110000111 9129).

Author information

Authors and Affiliations

China Aerospace Science and Technology Innovation Research Institute, Beijing, China
Haidong Xie
Qian Xuesen Laboratory, China Academy of Space Technology, Beijing, China
Xueshuang Xiang & Naijin Liu
Beijing International Center for Mathematical Research, Peking University, Beijing, China
Bin Dong
Center for Data Science, Peking University, Beijing, China
Bin Dong
Beijing Institute of Big Data Research, Beijing, China
Bin Dong

Authors

Haidong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xueshuang Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Dong
View author publications
You can also search for this author in PubMed Google Scholar
Naijin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haidong Xie .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Jian Pei
Shanghai Jiao Tong Univeristy, Shanghai, China
Guangtao Zhai
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, H., Xiang, X., Dong, B., Liu, N. (2024). Blind Adversarial Training: Towards Comprehensively Robust Models Against Blind Adversarial Attacks. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14474. Springer, Singapore. https://doi.org/10.1007/978-981-99-9119-8_2

Download citation

DOI: https://doi.org/10.1007/978-981-99-9119-8_2
Published: 03 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9118-1
Online ISBN: 978-981-99-9119-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics