Abstract
With the advancement of accelerated hardware in recent years, there has been a surge in the development and application of intelligent systems. Deep learning systems, in particular, have shown exciting results in a wide range of tasks: classification, detection, and recognition. Despite these remarkable achievements, there remains an active area critical for the safety of those systems. Deep learning algorithms have proven to be brittle against adversarial attacks. That is, carefully crafted adversarial inputs can consistently trigger an erroneous classification output from a network model. Hence, the motivation of this paper, we survey four different attacks, two adversarial defense methods on three benchmark datasets to gain a better understanding of how to protect those systems. We motivate our findings by achieving state-of-the-art accuracy and collecting empirical evidence of attack effectiveness against deep neural networks. Additionally, we leverage network explainability methods to investigate an alternative approach to defend deep neural networks.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, IJ., Fergus, R.: Intriguing properties of neural networks. ICLR (2014b). arxiv:1312.6199
Xie, C., Tan, M., Gong, B., Yuille, A., Le, Q. V.: Smooth adversarial training. (2020). arXiv preprint arXiv:2006.14536
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: international conference on learning representations (2015)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process, Syst (2014)
Xu, H., Ma, Y., Liu, H.C., Deb, D., Liu, H., Tang, J.L., Jain, A.K.: Adversarial attacks and defenses in images, graphs and text: a review. Int. J. Automat. Comput. 17(2), 151–178 (2020)
Martin, A., Soumith, C., Léon, B.: n (2017).Wasserstein GAN. arXiv preprint arXiv:1701.07875
Wainberg, M., Merico, D., Delong, A., Frey, B.J.: Deep learning in biomedicine. Nat. Biotechnol 36(9), 829–838 (2018)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with task learning. In: proceedings of the 25th international conference on machine learning, pp. 160-167. ACM (2008)
Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Delving deep into rectifiers: surpassing human-level performance on imageNet classification. In: proceedings of the 2015 IEEE international conference on computer vision (ICCV) (ICCV ’15). IEEE Computer Society, USA, 1026-1034. (2015) https://doi.org/10.1109/ICCV.2015.123
Tramàr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D.,and McDaniel, P. Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)
Samangouei, P., M. Kabkab, and R. Defense-GAN Chellappa. Protecting classifiers against adversarial attacks using generative models. arXiv 2018. arXiv preprint arXiv:1805.06605 (2018)
Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy (SP), pp. 39–57 (2017)
Papernot, N. et al.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS&P), pp. 372–387 (2016)
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. (2018)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto, (2009)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy (SP), pp. 39-57. IEEE (2017b)
Miyato, T., Dai, A. M., Goodfellow, I. (2016). Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725
Goodfellow, I., Qin, Y., Berthelot, D.:Evaluation methodology for attacks against confidence thresholding models (2018)
Papernot, N. et al.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE symposium on security and privacy (SP) (2016), pp. 582–597
Jha, S., Raj, S., Fernandes, S., Jha, S. K., Jha, S., Jalaian, B. Swami, A.:Attribution-based confidence metric for deep neural networks (2019)
Tian, Y., Pei, K., Jana, S., Ray, B.: Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: proceedings of the 40th international conference on software engineering, pp. 303–314 (2018)
Ruder, S., Vulić, I., Søgaard, A.: A survey of cross-lingual word embedding models. J. Artif. Intell. Res. 65, 569–631 (2019)
Sundararajan, M., Taly, A., Yan, Q.:Axiomatic attribution for deep networks. In: international conference on machine learning (pp. 3319-3328). PMLR (2017)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... Rabinovich, A.: Going deeper with convolutions. In: proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9) (2015)
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: international conference on machine learning, pp. 3145–3153. PMLR (2017)
Binder, A., Montavon, G., Lapuschkin, S., Müller, K. R., Samek, W.: Layer-wise relevance propagation for neural networks with local renormalization layers. In: international conference on artificial neural networks, pp. 63–71. Springer, Cham (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Malik, A., et al. Calibrated model-based deep reinforcement learning. In: international conference on machine learning. PMLR, (2019)
Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10(3), 61–74 (1999)
Guo, C., et al. On calibration of modern neural networks. In: international conference on machine learning (2017)
Park, S., et al. PAC confidence sets for deep neural networks via calibrated prediction. In: 8th international conference on learning representations (ICLR) (2020)
Wang, W., Xingye, Q.: Learning confidence sets using support vector machines. NeurIPS (2018)
Friedman, J., Hastie, T., Tibshirani, R.: The elements of statistical learning. Springer, New York (2001)
Naeini, M. P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: twenty-ninth AAAI conference on artificial intelligence. (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Michel, A., Jha, S.K. & Ewetz, R. A survey on the vulnerability of deep neural networks against adversarial attacks. Prog Artif Intell 11, 131–141 (2022). https://doi.org/10.1007/s13748-021-00269-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-021-00269-9