Abstract
Deep Neural Networks (DNN) has achieved a great success in many tasks in recent years. However, researchers found that DNN is vulnerable to adversarial examples that are maliciously perturbed inputs. The elaborately designed adversarial perturbations can easily confuse the model whereas have no impacts on human perception. To counter adversarial examples, we propose an integrated detection framework for detecting adversarial examples, which involves statistical detector and Gaussian noise injection detector. The statistical detector extracts Subtractive Pixel Adjacency Matrix (SPAM) and uses the second order Markov transition probability matrix to model SPAM so as to highlight the statistical anomaly hidden in an adversarial input. Then an ensemble classifier using SPAM based feature is applied to detect the adversarial input containing large perturbation. The Gaussian noise injection detector first injects an additive Gaussian noise into the input, and then feeds both the original input and its Gaussian noise injected counterpart into a targeted network. By comparing the two outputs difference, the detector is applied to detect adversarial input containing small perturbation: if the difference exceeds a threshold, the input is adversarial; otherwise legitimate. The two detectors are adaptive to different characteristics of adversarial perturbation so that the proposed detection framework is capable of detecting multiple types of adversarial examples. In our work, we test six categories of adversarial examples produced by Fast Gradient Sign Method (FGSM, untargeted), Randomized Fast Gradient Sign Method (R-FGSM, untargeted), Basic Iterative Method (BIM, untargeted), DeepFool (untargeted), Carlini&Wagner Method (CW_UT, untargeted) and CW_T(targeted). Comprehensive empirical results show that the proposed detection framework has achieved a promising performance on ImageNet database.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-019-7353-6/MediaObjects/11042_2019_7353_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-019-7353-6/MediaObjects/11042_2019_7353_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-019-7353-6/MediaObjects/11042_2019_7353_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-019-7353-6/MediaObjects/11042_2019_7353_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-019-7353-6/MediaObjects/11042_2019_7353_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-019-7353-6/MediaObjects/11042_2019_7353_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-019-7353-6/MediaObjects/11042_2019_7353_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-019-7353-6/MediaObjects/11042_2019_7353_Fig8_HTML.png)
Similar content being viewed by others
References
Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. IEEE Symp Sec Privacy (SP):39–57
Das N, Shanbhogue M, Chen S. T, Hohman F, Chen L, Kounavis M. E, Chau D. H (2017) Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression. arXiv preprint arXiv:1705.02900
Deng J, Dong W, Socher R, Li L. J, Li K, Fei-Fei L. (2009). Imagenet: a large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255
Dziugaite G K, Ghahramani Z, Roy D M (2016) A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853
Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Song D (2018) Robust physical-world attacks on deep learning models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Goodfellow I J, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR)
Grosse K, Manoharan P, Papernot N, Backes M, McDaniel P (2017) On the (statistical) detection of adversarial examples arXiv preprint arXiv:1702.06280
Gryllias KC, Antoniadis IA (2012) A support vector machine approach based on physical model training for rolling element bearing fault detection in industrial environments. Eng Appl Artif Intell 25(2):326–344
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comput Therm Sci 14(7):38–39
Kodovsky J, Fridrich J, Holub V (2012) Ensemble classifiers for steganalysis of digital media. IEEE Trans Inform Forens Sec 7(2):432–444
Kurakin A, Goodfellow I, Bengio S (2017) Adversarial examples in the physical world. In International Conference on Learning Representations (ICLR) Workshop Track
Li X, Li F (2017) Adversarial examples detection in deep networks with convolutional filter statistics. In International Conference on Computer Vision (ICCV) pp. 5775–5783
Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum D. S. (2016). Urban water quality prediction based on multi-task multi-view learning. In International joint conference on artificial intelligence (IJCAI), pp. 2576– 2582
Liu Y, Zhang L, Nie L, Yan Y, Rosenblum D. S. (2016). Fortune teller: predicting your career path. In American Association for Artificial Intelligence (AAAI). Vol. 2016, pp. 201–207
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Lu J, Issaranon T, Forsyth D (2017) Safetynet: detecting and rejecting adversarial examples robustly. In International Conference on Computer Vision (ICCV) pp. 446–454
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083
Meng D, Chen H. (2017) Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, pp. 135–147
Metzen J. H, Genewein T, Fischer V, Bischoff B (2017) On detecting adversarial perturbations. In International Conference on Learning Representations (ICLR)
Miyato T, Dai A M, Goodfellow I (2016) Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725
Moosavi Dezfooli S M, Fawzi A, Frossard P (2016) Deepfool: a simple and accurate method to fool deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2674–2582
Papernot N, McDaniel P, Sinha A, Wellman M (2016) Towards the science of security and privacy in machine learning arXiv preprint arXiv:1611.03814
Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE European Symposium on Security and Privacy (SP), pp. 582–597
Papernot N, Goodfellow I, Sheatsley R, Feinman R, McDaniel P (2016) cleverhans v2.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768
Pevny T, Bas P, Fridrich J (2010) Steganalysis by subtractive pixel adjacency matrix. IEEE Trans Inform Forens Sec 5(2):215–224
Santhanam G. K, Grnarova P. (2018). Defending Against Adversarial Attacks by Leveraging an Entire GAN. arXiv preprint arXiv:1805.10652
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Sharif M, Bhagavatula S, Bauer L, Reiter M. K (2016) Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, pp. 1528–1540
Shen S, Jin G, Gao K, Zhang Y (2017) APE-GAN: Adversarial Perturbation Elimination with GAN, arXiv preprint arXiv preprint arXiv:1707.05474
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR) Workshop Track
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826
Tramèr F, Kurakin A, Papernot N, Goodfellow I, Boneh D, McDaniel P (2018) Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations (ICLR)
Xie C, Wang J, Zhang Z, Ren Z, Yuille A (2018) Mitigating adversarial effects through randomization. In International Conference on Learning Representations (ICLR)
Xu, Weilin, David Evans, And Yanjun Qi (2017) feature squeezing: detecting adversarial examples in deep neural networks. In Network and Distributed System Security Symposium (NDSS)
Acknowledgements
This work was supported by Shanghai Municipal Natural Science Foundation under Grant No. 16ZR1411100 and the National Natural Science Foundation of China under Grant No. 61771301.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fan, W., Sun, G., Su, Y. et al. Integration of statistical detector and Gaussian noise injection detector for adversarial example detection in deep neural networks. Multimed Tools Appl 78, 20409–20429 (2019). https://doi.org/10.1007/s11042-019-7353-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7353-6