Abstract
Recent advances in deep neural network (DNN) techniques have increased the importance of security and robustness of algorithms where DNNs are applied. However, several studies have demonstrated that neural networks are vulnerable to adversarial examples, which are generated by adding crafted adversarial noises to the input images. Because the adversarial noises are typically imperceptible to the human eye, it is difficult to defend DNNs. One method of defense is the detection of adversarial examples by analyzing characteristics of input images. Recent studies have used the hidden layer outputs of the target classifier to improve the robustness but need to access the target classifier. Moreover, there is no post-processing step for the detected adversarial examples. They simply discard the detected adversarial images. To resolve this problem, we propose a novel detection-based method, which predicts the adversarial noise and detects the adversarial example based on the predicted noise without any target classification information. We first generated adversarial examples and adversarial noises, which can be obtained from the residual between the original and adversarial example images. Subsequently, we trained the proposed adversarial noise predictor to estimate the adversarial noise image and trained the adversarial detector using the input images and the predicted noises. The proposed framework has the advantage that it is agnostic to the input image modality. Moreover, the predicted noises can be used to reconstruct the detected adversarial examples as the non-adversarial images instead of discarding the detected adversarial examples. We tested our proposed method against the fast gradient sign method (FGSM), basic iterative method (BIM), projected gradient descent (PGD), Deepfool, and Carlini & Wagner adversarial attack methods on the CIFAR-10 and CIFAR-100 datasets provided by the Canadian Institute for Advanced Research (CIFAR). Our method demonstrated significant improvements in detection accuracy when compared to the state-of-the-art methods and resolved the wastage problem of the detected adversarial examples. The proposed method agnostic to the input image modality demonstrated that the noise predictor successfully captured noise in the Fourier domain and improved the performance of the detection task. Moreover, we resolved the post-processing problem of the detected adversarial examples with the reconstruction process using the predicted noise.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Andriushchenko M, Croce F, Flammarion N, Hein M (2020) Square attack: a query-efficient black-box adversarial attack via random search. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer vision – ECCV 2020. Springer, Cham, pp 484–501
Athalye A, Carlini N, Wagner DA (2018) Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: ICML, pp. 274–283. http://proceedings.mlr.press/v80/athalye18a. html
Athalye A, Engstrom L, Ilyas A, Kwok K (2018) Synthesizing robust adversarial examples. In: international conference on machine learning. pp. 284–293
Carlini N, Wagner D (2017) Adversarial examples are not easily detected: Bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14
Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. In: 2017 Ieee symposium on security and privacy (sp), pp. 39–57. IEEE
Chen P-Y, Zhang H, Sharma Y, Yi J, Hsieh C-J (2017) Zoo. Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. https://doi.org/10.1145/3128572.3140448
Cisse M, Bojanowski P, Grave E, Dauphin Y, Usunier N (2017) Parseval networks: improving robustness to adversarial examples. In: international conference on machine learning, pp. 854–863. PMLR
Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning. ICML ‘08, pp. 160–167. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1390156.1390177
Croce F, Hein M (2020) Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: international conference on machine learning. pp. 2206–2216
Croce F, Hein M (2020) Minimally distorted adversarial examples with a fast adaptive boundary attack. In: international conference on machine learning, pp. 2196–2205
Dong Y, Fu Q-A, Yang X, Pang T, Su H, Xiao Z, Zhu J (2020) Benchmarking adversarial robustness on image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 321–331
Dziugaite GK, Ghahramani Z, Roy DM (2016) A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853
Gong Z, Wang W, Ku W (2017) Adversarial and clean data are not twins. CoRR abs/1704.04960 https://arxiv.org/abs/1704.04960
Goodfellow I, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. CoRR abs/1412.6572
Gu T, Dolan JM, Lee J-W (2016) Human-like planning of swerve maneuvers for autonomous vehicles. In: 2016 IEEE intelligent vehicles symposium (IV), pp. 716–721. IEEE
Guo C, Rana M, Cisse M, Van Der Maaten L (2017) Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117
Harder P, Pfreundt F-J, Keuper M, Keuper J (2021) Spectraldefense: Detecting adversarial attacks on cnns in the fourier domain. 2021 International Joint Conference on Neural Networks (IJCNN). 1–8
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
Hein M, Andriushchenko M (2017) Formal guarantees on the robustness of a classifier against adversarial manipulation. arXiv preprint arXiv:1705.08475
Hendrycks D, Gimpel K (2016) Visible progress on adversarial images and a new saliency map. CoRR abs/1608.00530 https://arxiv.org/abs/1608.00530
Hochreiter S, Schmidhuber J (1997) https://arxiv.org/abs/https://direct.mit.edu/neco/article-pdf/9/8/1735/813796/neco.1997.9.8.1735.pdf) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Kannan H, Kurakin A, Goodfellow IJ (2018) Adversarial logit pairing. CoRR abs/1803.06373 https://arxiv.org/abs/1803.06373
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Kurakin A, Goodfellow IJ, Bengio S (2016) Adversarial examples in the physical world. CoRR abs/1607.02533 https://arxiv.org/abs/1607.02533
Lee K, Lee K, Lee H, Shin J (2018) A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18, pp. 7167–7177. Curran Associates Inc., Red Hook, NY, USA
Liao F, Liang M, Dong Y, Pang T, Hu X, Zhu J (2018) Defense against adversarial attacks using high-level representation guided denoiser. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1778–1787
Ma X, Li B, Wang Y, Erfani SM, Wijewickrema SNR, Houle ME, Schoenebeck G, Song D, Bailey J (2018) Characterizing adversarial subspaces using local intrinsic dimensionality. CoRR abs/1801.02613 https://arxiv.org/abs/1801.02613
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations. https://openreview.net/forum?id=rJzIBfZAb
Moosavi-Dezfooli S-M, Fawzi A, Frossard P (2016) Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2574–2582
Pang T, Xu K, Zhu J (2019) Mixup inference: Better exploiting mixup to defend adversarial attacks. arXiv preprint arXiv:1909.11515
Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A (2017) Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. pp. 506–519
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: international conference on medical image computing and computer-assisted intervention, pp. 234–241. Springer
Samangouei P, Kabkab M, Chellappa R (2018) Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus, R (2015) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199
Xu W, Evans D, Qi Y (2017) Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155
Acknowledgements
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2021-0-00511, Robust AI and Distributed Attack Detection for Edge AI Security).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Author agreement
All authors including Seunghwan Jung, Minyoung Chung, and Yeong-Gil Shin agreed the submission of this manuscript.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jung, S., Chung, M. & Shin, YG. Adversarial example detection by predicting adversarial noise in the frequency domain. Multimed Tools Appl 82, 25235–25251 (2023). https://doi.org/10.1007/s11042-023-14608-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14608-6