Abstract
Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial examples (AEs). Denoising based on the input pre-processing is one of the defenses against adversarial attacks. However, it is hard to remove multiple adversarial perturbations, especially in the presence of evolving attacks. To address this challenge, we attempt to extract the commonality of adversarial perturbations. Due to the imperceptibility of adversarial perturbations in the input space, we conduct the extraction in the deep feature space where the perturbations become more apparent. Through the obtained common characteristics, we craft common adversarial examples (CAEs) to train the denoiser. Furthermore, to prevent image distortion while removing as much of the adversarial perturbation as possible, we propose a hybrid loss function that guides the training process at both the pixel level and the deep feature space. Our experiments show that our defense method can eliminate multiple adversarial perturbations, significantly enhancing adversarial robustness compared to previous state-of-the-art methods. Moreover, it can be plug-and-play for various classification models, which demonstrates the generalizability of our defense method.
Similar content being viewed by others
Data Availability and Access
The data used to support the findings of this study will be available from the corresponding author upon request after acceptance.
References
Abbas A, Abdelsamea MM, Gaber MM (2021) Classification of COVID-19 in chest x-ray images using detrac deep convolutional neural network. Appl Intell 51(2):854–864
Szegedy C, Zaremba W, Sutskever I, et al (2014) Intriguing properties of neural networks. In: Proc Int Conf Learn Represent (ICLR)
Tamou AB, Benzinou A, Nasreddine K (2021) Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors. Appl Intell 51(8):5809–5821
Liu M, Zhang Y, Xu J et al (2021) Deep bi-directional interaction network for sentence matching. Appl Intell 51(7):4305–4329
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: Proc Int Conf Learn Represent (ICLR)
Yurtsever E, Lambert J, Carballo A et al (2020) A survey of autonomous driving: common practices and emerging technologies. IEEE Access 8:58443–58469
Pham Q, Sevestre P, Pahwa RS, et al (2020) A*3d dataset: Towards autonomous driving in challenging environments. In: 2020 IEEE international conference on robotics and automation, ICRA 2020, Paris, France, May 31 - August 31, 2020. IEEE, pp 2267–2273
Hoai NV, Nguyen HM, Pham C (2022) Masked face recognition with convolutional neural networks and local binary patterns. Appl Intell 52(5):5497–5512
Xu K, Zhang G, Liu S, et al (2020) Adversarial t-shirt! evading person detectors in a physical world. In: Proceeding European Conference Computer Vision (ECCV), Lecture Notes in Computer Science, vol 12350. Springer, pp 665–681
Ho J, Lee B, Kang D (2022) Attack-less adversarial training for a robust adversarial defense. Appl Intell 52(4):4364–4381
Tramèr F, Kurakin A, Papernot N, et al (2018) Ensemble adversarial training: attacks and defenses. In: Proc Int Conf Learn Represent (ICLR)
Jia X, Zhang Y, Wu B et al (2022) Boosting fast adversarial training with learnable adversarial initialization. IEEE Trans Image Process 31:4417–4430
Madry A, Makelov A, Schmidt L, et al (2018) Towards deep learning models resistant to adversarial attacks. In: Proc Int Conf Learn Represent (ICLR)
Gu S, Rigazio L (2015) Towards deep neural network architectures robust to adversarial examples. In: Proc Int Conf Learn Represent (ICLR)
Wang S, Gong Y (2022) Adversarial example detection based on saliency map features. Appl Intell 52(6):6262–6275
Sarvar A, Amirmazlaghani M (2023) Defense against adversarial examples based on wavelet domain analysis. Appl Intell 53(1):423–439
Guo C, Rana M, Cissé M, et al (2018) Countering adversarial images using input transformations. In: Proc Int Conf Learn Represent (ICLR)
Liao F, Liang M, Dong Y, et al (2018) Defense against adversarial attacks using high-level representation guided denoiser. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). Computer Vision Foundation / IEEE Computer Society, pp 1778–1787
Xu W, Evans D, Qi Y (2018) Feature squeezing: Detecting adversarial examples in deep neural networks. In: 25th annual network and distributed system security symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018. The Internet Society
He W, Wei J, Chen X et al (2017) (2017) Adversarial example defense: ensembles of weak defenses are not strong. In: Enck W, Mulliner C (eds) 11th USENIX Workshop on Offensive Technologies, WOOT 2017. BC, Canada, August, Vancouver, pp 14–15
Samangouei P, Kabkab M, Chellappa R (2018) Defense-gan: Protecting classifiers against adversarial attacks using generative models. In: Proc Int Conf Learn Represent (ICLR)
Jin G, Shen S, Zhang D, et al (2019) APE-GAN: adversarial perturbation elimination with GAN. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019. IEEE, pp 3842–3846
Dong Y, Liao F, Pang T, et al (2018) Boosting adversarial attacks with momentum. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). Computer Vision Foundation / IEEE Computer Society, pp 9185–9193
Meng D, Chen H (2017) Magnet: A two-pronged defense against adversarial examples. In: Thuraisingham B, Evans D, Malkin T, et al (eds) Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017. ACM, pp 135–147
Carlini N, Wagner DA (2017) Magnet and “efficient defenses against adversarial attacks” are not robust to adversarial examples. arXiv:1711.08478
Dong Y, Liao F, Pang T, et al (2018) Boosting adversarial attacks with momentum. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). Computer Vision Foundation / IEEE Computer Society, pp 9185–9193
Moosavi-Dezfooli S, Fawzi A, Frossard P (2016) Deepfool: a simple and accurate method to fool deep neural networks. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). IEEE Computer Society, pp 2574–2582
Rony J, Hafemann LG, Oliveira LS, et al (2019) Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). Computer Vision Foundation / IEEE, pp 4322–4330
Croce F, Hein M (2020) Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: Proc Int Conf Mach Learn (ICML), Proceedings of Machine Learning Research, vol 119. PMLR, pp 2206–2216
Carlini N, Wagner DA (2017) Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. IEEE Computer Society, pp 39–57
Dong Y, Pang T, Su H, et al (2019) Evading defenses to transferable adversarial examples by translation-invariant attacks. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). Computer Vision Foundation / IEEE, pp 4312–4321
Xie C, Zhang Z, Zhou Y, et al (2019) Improving transferability of adversarial examples with input diversity. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). Computer Vision Foundation / IEEE, pp 2730–2739
Wu K, Wang AH, Yu Y (2020) Stronger and faster wasserstein adversarial attacks. In: Proc Int Conf Mach Learn (ICML), Proceedings of Machine Learning Research, vol 119. PMLR, pp 10377–10387
Xiao C, Zhu J, Li B, et al (2018) Spatially transformed adversarial examples. In: Proc Int Conf Learn Represent (ICLR)
Tramèr F, Kurakin A, Papernot N, et al (2018) Ensemble adversarial training: Attacks and defenses. In: Proc Int Conf Learn Represent (ICLR)
Yan Z, Guo Y, Zhang C (2018) Deep defense: Training dnns with improved adversarial robustness. In: Bengio S, Wallach HM, Larochelle H, et al (eds) Proc Adv Neural Inf Process Syst (NeurIPS), pp 417–426
Xie C, Wang J, Zhang Z, et al (2018) Mitigating adversarial effects through randomization. In: Proc Int Conf Learn Represent (ICLR)
Jia X, Wei X, Cao X, et al (2019) Comdefend: An efficient image compression model to defend adversarial examples. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). Computer Vision Foundation / IEEE, pp 6084–6092
Xie C, Wu Y, van der Maaten L, et al (2019) Feature denoising for improving adversarial robustness. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). Computer Vision Foundation / IEEE, pp 501–509
Zhou D, Wang N, Peng C, et al (2021) Removing adversarial noise in class activation feature space. In: Proc IEEE/CVF Int Conf Comput Vis (ICCV). IEEE, pp 7858–7867
Zhang H, Yu Y, Jiao J, et al (2019) Theoretically principled trade-off between robustness and accuracy. In: Proc Int Conf Mach Learn (ICML), Proceedings of Machine Learning Research, vol 97. PMLR, pp 7472–7482
Netzer Y, Wang T, Coates A, et al (2011) Reading digits in natural images with unsupervised feature learning. nips workshop on deep learning & unsupervised feature learning
Deng J, Dong W, Socher R, et al (2009) Imagenet: a large-scale hierarchical image database. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). IEEE Computer Society, pp 248–255
Lin J, Song C, He K, et al (2020) Nesterov accelerated gradient and scale invariance for adversarial attacks. In: Proc Int Conf Learn Represent (ICLR)
Wang X, He K (2021) Enhancing the transferability of adversarial attacks through variance tuning. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR), pp 1924–1933
Wu T, Tong L, Vorobeychik Y (2020) Defending against physically realizable attacks on image classification. In: Proc Int Conf Learn Represent (ICLR)
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proc Int Conf Learn Represent (ICLR)
Kim H (2020) Torchattacks : a pytorch repository for adversarial attacks. arXiv:2010.01950
Ding GW, Wang L, Jin X (2019) advertorch v0.1: an adversarial robustness toolbox based on pytorch. arXiv:1902.07623
Athalye A, Carlini N, Wagner DA (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Dy JG, Krause A (eds) Proc Int Conf Mach Learn (ICML), pp 274–283
Acknowledgements
This work was supported by the Major Research Plan of the National Natural Science Foundation of China (92167203), Key Program of Zhejiang Provincial Natural Science Foundation of China (LZ22F020007), the Major Key Project of Peng Cheng Laboratory (2022A03), and Science and Technology Innovation Foundation for Graduate Students of Zhejiang University of Science and Technology (F464108M05).
Author information
Authors and Affiliations
Contributions
Jianchang Huang: Conceptualization, Methodology, Formal analysis, Investigation, Writing - original draft, Writing - review & editing, Software. Yaguan Qian: Conceptualization, Methodology, Formal analysis, Investigation, Writing - original draft, Writing - review & editing, Funding acquisition. Yinyao Dai: Data curation, Investigation, Writing - Review & Editing. Fang Lu: Writing - Review & Editing, Investigation. Bin Wang: Writing - Review & Editing, Funding acquisition. Zhaoquan Gu: Writing - Review & Editing. Boyang Zhou: Writing - Review & Editing.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Ethics Approval
The data used in this paper do not involve ethical issues.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, J., Dai, Y., Lu, F. et al. Adversarial perturbation denoising utilizing common characteristics in deep feature space. Appl Intell 54, 1672–1690 (2024). https://doi.org/10.1007/s10489-023-05253-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05253-5