A Novel Scheme for Adversarial Training to Improve the Robustness of DNN Against White Box Attacks

Rohith, N. Sai Mani; Deepthi, P. P.

doi:10.1007/978-3-031-31417-9_28

N. Sai Mani Rohith¹⁰ &
P. P. Deepthi¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1777))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

412 Accesses

Abstract

Deep Neural Networks (DNNs) are found to give excellent results in many applications including image classification. DNNs are found to have reduced efficiency in their performance when exposed to adversarial attacks. An adversarial attack is a phenomenon that is used to fool the DNN, by adding imperceptible perturbations to the input. Under white-box attack conditions, when an adversary has complete knowledge of the network and may produce substantial perturbations via repeated iterations, the robustness of current defense methods against these assaults is severely compromised. By observing learned feature space of a DNN it is noted that different class samples are within close proximity due to which by adding imperceptible perturbations, the feature map of input in the learned feature space is being mapped away from its respective class samples. This forces the model to completely change its decision when an unnoticeable perturbation is added to the input. To counter such attacks, this work attempts to force the DNN to learn how to maximize the distance between different class samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bai, Y., Zeng, Y., Jiang, Y., Xia, S.-T., Ma, X., Wang, Y. Improving adversarial robustness via channel-wise activation suppressing. In: ICLR 2021. OpenReview.net (2021)
Google Scholar
Yan, H., Zhang, J., Niu, G., Feng, J., Tan, V.Y.F., Sugiyama, M.: CIFS: improving adversarial robustness of CNNs via channel-wise importance-based feature selection. In: ICLR (2021)
Google Scholar
Goodfellow, I. J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
Google Scholar
Tramer, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. In: ICLR (2018)
Google Scholar
Athalye, A., Carlini, N., Wagner D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420. In: ICML (2018)
Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Tran, B., Madry, A.: Adversarial robustness as a prior for learned representations. arXiv preprint arXiv:1906.00945 (2019)
Miyato, T., Maeda, S.-I., Koyama, M., Nakae, K., Ishii, S.: Distributional smoothing with virtual adversarial training. arXiv preprint arXiv:1507.00677 (2015)
Guo, C., Rana, M., Cisse, M., van der Maaten, L.: Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117 (2017)
Xie, C., Wang, J., Zhang, Z., Ren, Z., Yuille, A.: Mitigating adversarial effects through randomization. In: International Conference on Learning Representations (2018)
Google Scholar
Raghunathan, A., Steinhardt, J., Liang, P.: Certified defenses against adversarial examples. arXiv preprint arXiv:1801.09344 (2018)
Kolter, J.Z., Wong, E.: Provable defenses against adversarial examples via the convex outer adversarial polytope. arXiv preprint arXiv:1711.00851, 1(2), 3 (2017)
Hein, M. Andriushchenko, M.: Formal guarantees on the robustness of a classifier against adversarial manipulation. In: NeurIPS (2017)
Google Scholar
Guo, C., Rana, M., Cisse, M., van der Maaten, L.: Countering adversarial images using input transformations. In: ICLR (2018)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
Moosavi-Dezfooli, S.-M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)
Google Scholar
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016)
Pang, T., Xu, K., Du, C., Chen, N., Zhu, J.: Improving adversarial robustness via promoting ensemble diversity. arXiv preprint arXiv:1901.08846 (2019)
Ross, A.S., Doshi-Velez, F.: Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. In: ICLR 2019. OpenReview.net (2019)
Google Scholar
Zhang, H., Yu, Y., Jiao, J., Xing, E.P., Ghaoui, L.E., Jordan, M.I.: Theoretically principled trade-off between robustness and accuracy. In: ICML 2019. PMLR (2019)
Google Scholar
Zhang, J., et al.: Attacks which do not kill training make adversarial learning stronger. In: ICML 2020 (2020)
Google Scholar
Xie, C., Wu, Y., van der Maaten, L., Yuille, A. L., He, K.: Feature denoising for improving adversarial robustness. In: CVPR 2019. Computer Vision Foundation/IEEE (2019)
Google Scholar
Dhillon, G.S., et al.: Stochastic activation pruning for robust adversarial defense. arXiv preprint arXiv:1803.01442 (2018)
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 372–387. IEEE (2016)
Google Scholar

Download references

Acknowledgement

Authors sincerely acknowledge the contributions of Dr. Renu M Rameshan, Assistant Professor, IIT Mandi in bringing out this work.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, National Institute of Technology Calicut, Kozhikode, 673601, India
N. Sai Mani Rohith & P. P. Deepthi

Authors

N. Sai Mani Rohith
View author publications
You can also search for this author in PubMed Google Scholar
P. P. Deepthi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. P. Deepthi .

Editor information

Editors and Affiliations

Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Deep Gupta
Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Kishor Bhurchandi
Indian Institute of Technology Ropar, Rupnagar, India
Subrahmanyam Murala
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Technology Roorkee, Roorkee, India
Sanjeev Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rohith, N.S.M., Deepthi, P.P. (2023). A Novel Scheme for Adversarial Training to Improve the Robustness of DNN Against White Box Attacks. In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1777. Springer, Cham. https://doi.org/10.1007/978-3-031-31417-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-31417-9_28
Published: 07 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31416-2
Online ISBN: 978-3-031-31417-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Novel Scheme for Adversarial Training to Improve the Robustness of DNN Against White Box Attacks