Skip to main content
Log in

Why does batch normalization induce the model vulnerability on adversarial images?

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Batch normalization is one of the most widely used components in deep neural networks. It can accelerate training, and boost model performance on normal samples. However, batch normalization induces vulnerability to models on adversarial examples, especially in medical images, and the reason is still not clear. In this paper, we aim to explain the vulnerability aroused by batch normalization under adversarial images. Specifically, we first discover that both natural and medical images contain a large number of trivial features, whose weights will be enlarged under adversarial attacks, and batch normalization can further enlarge their weights. Additionally, we find that batch normalization will reduce the inter-class margin of high-level features, leading to less tolerance to adversarial perturbations, thereby decreasing the model robustness. Moreover, we hypothesize that the smaller inter-class margin, the more difficult to attain the optimal classification space, which means batch normalization will restrict the performance of adversarial training. This further verifies that a narrower inter-class margin induced by batch normalization reduces the model robustness. Experiments on four benchmark datasets demonstrate our discovery, interpretation and hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. You can access it from https://www.kaggle.com/c/imagenet-object-localization-challenge/data

  2. You can access it from https://www.kaggle.com/nih-chest-xrays/data

  3. You can access it from https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data

  4. You can access it from https://cocodataset.org/#download

References

  1. Peng, L., Hu, R., Kong, F., Gan, J., Mo, Y., Shi, X., Zhu, X.: Reverse graph learning for graph neural network. IEEE Trans. Neural Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2022.3161030

  2. Yuan, C., Zhong, Z., Lei, C., Zhu, X., Hu, R.: Adaptive reverse graph learning for robust subspace learning. Inf Process Manage. (2021). https://doi.org/10.1016/j.ipm.2021.102733 

  3. Zhu, X., Li, X., Zhang, S., Xu, Z., Yu, L., Wang, C.: Graph pca hashing for similarity search. IEEE Trans. Multimedia 19(9), 2033–2044 (2017)

    Article  Google Scholar 

  4. Zhu, X., Zhang, S., Zhu, Y., Zhu, P., Gao, Y.: Unsupervised spectral feature selection with dynamic hyper-graph learning. IEEE Trans. Knowl. Data Eng.  (2020). https://doi.org/10.1109/TKDE.2020.3017250

  5. Zhu, X., Li, X., Zhang, S.: Block-row sparse multiview multilabel learning for image classification. IEEE Trans. Cybern. 46(2), 450–461 (2016)

    Article  Google Scholar 

  6. Shi, X., Guo, Z., Xing, F., Liang, Y., Yang, L.: Anchor-based self-ensembling for semi-supervised deep pairwise hashing. Int. J. Comput. Vis. 128(8), 2307–2324 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  7. Shi, X., Xing, F., Zhang, Z., Sapkota, M., Guo, Z., Yang, L.: A scalable optimization mechanism for pairwise based discrete hashing. IEEE Trans. Image Process. 30, 1130–1142 (2020)

    Article  MathSciNet  Google Scholar 

  8. Gan, J., Peng, Z., Zhu, X., Hu, R., Ma, J., Wu, G.: Brain functional connectivity analysis based on multi-graph fusion. Med. Image Anal.  (2021). https://doi.org/10.1016/j.media.2021.102057

  9. Hu, R., Peng, Z., Zhu, X., Gan, J., Zhu, Y., Ma, J., Wu, G.: Multi-band brain network analysis for functional neuroimaging biomarker identification. IEEE Trans. Med. Imaging. (2021). https://doi.org/10.1109/TMI.2021.3099641

  10. Zhu, Y., Ma, J., Yuan, C., Zhu, X.: Interpretable learning based dynamic graph convolutional networks for alzheimer’s disease analysis. Information Fusion 77, 53–61 (2022)

    Article  Google Scholar 

  11. Zhao, Z., Dua, D., Singh, S.: Generating natural adversarial examples. ICLR (2018)

  12. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. ICLR (2018)

  13. Schmidt, L., Talwar, K., Santurkar, S., Tsipras, D., Madry, A.: Adversarially robust generalization requires more data. In: NIPS, pp. 5014–5026 (2018)

  14. Yin, D., Lopes, R.G., Shlens, J., Cubuk, E.D., Gilmer, J.: A fourier perspective on model robustness in computer vision. In: NIPS, pp. 13255–13265 (2019)

  15. Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. NIPS (2019)

  16. Ford, N., Gilmer, J., Carlini, N., Cubuk, E.D.: Adversarial examples are a natural consequence of test error in noise. In: ICML, pp. 4115–4139 (2019)

  17. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)

  18. Tanay, T., Griffin, L.: A boundary tilting persepective on the phenomenon of adversarial examples. arXiv:1608.07690 (2016)

  19. Gilmer, J., Metz, L., Faghri, F., Schoenholz, S.S., Raghu, M., Wattenberg, M., Goodfellow, I.: Adversarial Spheres. In: ICLR (2018)

  20. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)

  21. Scherer, D., Muller, A., Behnke, S.: Evaluation of pooling operations in convolutional architectures for object recognition. In: ICANN, pp. 92–101 (2010)

  22. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR, 1929–1958 (2014)

  23. Galloway, A., Golubeva, A., Tanay, T., Moussa, M., Taylor, G.W.: Batch normalization is a cause of adversarial vulnerability. arXiv:1905.02161 (2019)

  24. Benz, P., Zhang, C., Kweon, I.S.: Batch normalization increases adversarial vulnerability: Disentangling usefulness and robustness of model features. arXiv:2010.03316 (2020)

  25. Lin, M., Chen, Q., Yan, S.: Network In Network. arXiv:1312.4400 (2014)

  26. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. arXiv:1607.06450 (2016)

  27. Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv:1607.08022 (2017)

  28. Awais, M., Shamshad, F., Bae, S.H.: Towards an Adversarially Robust Normalization Approach. arXiv:2006.11007 (2020)

  29. Nado, Z., Padhy, S., Sculley, D., D’Amour, A., Lakshminarayanan, B., Snoek, J.: Evaluating prediction-time batch normalization for robustness under covariate shift. arXiv:2006.10963 [cs, stat] (2021)

  30. Sun, J., Cao, X., Liang, H., Huang, W., Chen, Z., Li, Z.: New interpretations of normalization methods in deep learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (04), pp 5875–5882 (2020)

  31. Benz, P., Zhang, C., Karjauv, A., Kweon, I.S.: Revisiting batch normalization for improving corruption robustness. In: WACV, pp. 494–503 (2021)

  32. Dauphin, Y., Cubuk, D.E.: Deconstructing the regularization of batchnorm. ICLR (2021)

  33. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR, pp. 2921–2929 (2016)

  34. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)

  35. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  36. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.: Chestx-Ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: CVPR, pp. 3462–71 (2017)

  37. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.: Microsoft COCO: Common Objects in Context. In: Computer Vision - ECCV 2014. 13Th European Conference. Proceedings: LNCS 8693, Vol. Pt.V, pp 740–55. Cham, Switzerland (2014)

  38. Rauber, J., Brendel, W., Bethge, M.: Foolbox: a Python Toolbox to Benchmark the Robustness of Machine Learning Models. In: ICML (2017)

  39. Rauber, J., Zimmermann, R., Bethge, M., Brendel, W.: Foolbox native: Fast adversarial attacks to benchmark the robustness of machine learning models in pytorch, tensorflow, and jax. Journal of Open Source Software 5(53), 2607 (2020)

    Article  Google Scholar 

Download references

Funding

This work is partially supported by the National Natural Science Foundation of China (Grant No: 61876046) and the Guangxi “Bagui” Teams for Innovation and Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoshuang Shi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Web-based Intelligent Financial Services.

Guest Editors: Hong-Ning Dai, Xiaohui Haoran, and Miguel Martinez.

This work is partially supported by the National Natural Science Foundation of China (Grant No: 61876046) and the Guangxi “Bagui” Teams for Innovation and Research.

Appendices

Appendix : A: Details about the Datasets

Table 5 and 6 show the selected categories and their numbers of images for training and testing. We select those classes because of their relatively large number of samples. It is easy to train our model and analyze our results on those samples.

Table 5 Selected categories and the number of selected images from the ILSVRC dataset
Table 6 Selected categories and the number of selected images from the COCO dataset

Appendix : B: Details of Network Architectures

Table 7 shows the details of convolutional layers of the model VGG. We mark the location for extracting mid-level features and high-level features by space. Table 8 displays the details of fully connection layers. Top block presents the details of VGG16, and the bottom block provides the details of VGG-C.

Table 7 Convolutional layers in VGG16 and VGG-C. In each Conv2d, the bias is True, the padding size is 1, the stride size is 1. out_ch means the number of channels of output, ks means the kernel size. Batch normalization and ReLU are followed by each Conv2d layer. The first and the second space mark that the layer above is middle-level feature and high-level feature
Table 8 The used classifiers in VGG16 and VGG-C. VGG16 adopts the classifier in the top block (rows above space), and VGG-C employs the classifier in the bottom block (rows below space)
Table 9 The used architecture of ResNet18. The first and the second space mark that the layer above is middle-level feature and high-level feature

Appendix : C: The Architecture of Basic Block

Table 9 shows the details of ResNet18. We mark the location for extracting mid-level features and high-level features by space. Figure 5 presents the details of Basic Block used in Table 9.

Fig. 5
figure 5

The architecture of Basic Block

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kong, F., Liu, F., Xu, K. et al. Why does batch normalization induce the model vulnerability on adversarial images?. World Wide Web 26, 1073–1091 (2023). https://doi.org/10.1007/s11280-022-01066-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-022-01066-7

Keywords

Navigation