Detecting Adversarial Examples via Classification Difference of a Robust Surrogate Model

Peng, Anjie; Deng, Kang; Zeng, Hui; Wu, Kaijun; Yu, Wenxin

doi:10.1007/978-981-99-8148-9_43

Anjie Peng^10,11,
Kang Deng¹⁰,
Hui Zeng¹⁰,
Kaijun Wu¹² &
…
Wenxin Yu¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1966))

Included in the following conference series:

International Conference on Neural Information Processing

812 Accesses

Abstract

Existing supervised learning detectors may deteriorate their performance when detecting unseen adversarial examples (AEs), because they may be sensitive with training samples. We found that (1) the CNN classifier is modest robust against AEs generated from other CNNs, and (2) such adversarial robustness is rarely affected by unseen instances. So, we construct an attack-agnostic detector based on an adversarial robust surrogate CNN to detect unknown AEs. Specifically, for a protected CNN classifier, we design a surrogate CNN classifier and predict the image with different classification labels on them as an AE. In order to detect transferable AEs and maintain low false positive rate, the surrogate model is distilled from the protected model, aiming at enhancing the adversarial robustness (i.e., suppress the transferability of AE) and meanwhile mimicking the output of clean image. To defend the potential ensemble attack targeted at our detector, we propose a new adversarial training scheme to enhance the security of the proposed detector. Experimental results of generalization ability tests on Cifar-10 and ImageNet-20 show that our method can detect unseen AEs effectively and performs much better than the state-of-the-arts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A divide-and-conquer reconstruction method for defending against adversarial example attacks

Article Open access 09 October 2024

Towards Fast and Robust Adversarial Training for Image Classification

Detecting adversarial examples using image reconstruction differences

Article 17 March 2023

Notes

1.
20 categories: Tench, harvestman, dunlin, brambling, black grouse, sea lion, water ouzel, lorikeet, Afghan, bullfrog, black swan, anole, wolfhound, flatworm, pit bull terrier, alligator, fiddler crab, Sealyham, night snake, flamingo.

References

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Jia, X., Zhang, Y., Wu, B., Ma, K., Wang, J., Cao, X.: LAS-AT: adversarial training with learnable attack strategy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13398–13408 (2022)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)
Liang, B., Li, H., Su, M., Li, X., Shi, W., Wang, X.: Detecting adversarial image examples in deep neural networks with adaptive noise reduction. IEEE Trans. Depend. Secure Comput. 18(1), 72–85 (2018)
Article Google Scholar
Hu, S., Yu, T., Guo, C., Chao, W.L., Weinberger, K.Q.: A new defense against adversarial images: Turning a weakness into a strength. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Roth, K., Kilcher, Y., Hofmann, T.: The odds are odd: a statistical test for detecting adversarial examples. In: International Conference on Machine Learning, pp. 5498–5507. PMLR (2019)
Google Scholar
Feinman, R., Curtin, R.R., Shintre, S., Gardner, A.B. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017)
Papernot, N., McDaniel, P.: Deep k-nearest neighbors: towards confident, interpretable, and robust deep learning. arXiv preprint arXiv:1803.04765 (2018)
Ma, X., et al.: Characterizing adversarial subspaces using local intrinsic dimensionality. arXiv preprint arXiv:1801.02613 (2018)
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Adv. Neural Inf. Process. Syst. 31 (2018)
Google Scholar
Tian, J., Zhou, J., Li, Y., Duan, J.: Detecting adversarial examples from sensitivity inconsistency of spatial-transform domain. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 11, pp. 9877–9885 (2021)
Google Scholar
Chen, K., et al.: Adversarial examples detection beyond image space. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3850–3854 (2021)
Google Scholar
Fan, W., Sun, G., Su, Y., Liu, Z., Lu, X.: Integration of statistical detector and Gaussian noise injection detector for adversarial example detection in deep neural networks. Multimedia Tools Appl. 78, 20409–20429 (2019)
Article Google Scholar
Liu, J., et al.: Detection based defense against adversarial examples from the steganalysis point of view. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4825–4834 (2019)
Google Scholar
Peng, A., Deng, K., Zhang, J., Luo, S., Zeng, H., Yu, W.: Gradient-based adversarial image forensics. In: Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, 23–27 November 2020, Proceedings, Part II, vol. 27, pp. 417–428 (2020)
Google Scholar
Guo, F., et al.: Detecting adversarial examples via prediction difference for deep neural networks. Inf. Sci. 501, 182–192 (2019)
Article Google Scholar
Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2730–2739 (2019)
Google Scholar
Waseda, F., Nishikawa, S., Le, T.N., Nguyen, H.H., Echizen, I.: Closer look at the transferability of adversarial examples: how they fool different models differently. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1360–1368 (2023)
Google Scholar
Rony, J., Hafemann, L.G., Oliveira, L.S., Ayed, I.B., Sabourin, R., Granger, E.: Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4322–4330 (2019)
Google Scholar
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9185–9193 (2018)
Google Scholar
Dong, Y., Pang, T., Su, H., Zhu, J. Evading defenses to transferable adversarial examples by translation-invariant attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4312–4321 (2019)
Google Scholar
Lin, J., Song, C., He, K., Wang, L., Hopcroft, J.E.: Nesterov accelerated gradient and scale invariance for adversarial attacks. arXiv preprint arXiv:1908.06281 (2019)
Yang, K., Zhou, T., Zhang, Y., Tian, X., Tao, D.: Class-disentanglement and applications in adversarial detection and defense. Adv. Neural Inf. Process. Syst. 34, 16051–16063 (2021)
Google Scholar
Tramèr, F., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453 (2017)
Chen, P.Y., Sharma, Y., Zhang, H., Yi, J., Hsieh, C.J.: EAD: elastic-net attacks to deep neural networks via adversarial examples. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy, pp. 39–57 (2017)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Tan, M., Le, Q.: Efficient-net: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019)
Google Scholar
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Artificial Intelligence Safety and Security, pp. 99–112 (2018)
Google Scholar
Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: International Conference on Machine Learning, pp. 2206–2216. PMLR (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Southwest University of Science and Technology, Mianyang, 621010, Sichuan, China
Anjie Peng, Kang Deng, Hui Zeng & Wenxin Yu
Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing, China
Anjie Peng
Science and Technology on Communication Security Laboratory, Chengdu, 610041, Sichuan, China
Kaijun Wu

Authors

Anjie Peng
View author publications
You can also search for this author in PubMed Google Scholar
Kang Deng
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Kaijun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Zeng .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, A., Deng, K., Zeng, H., Wu, K., Yu, W. (2024). Detecting Adversarial Examples via Classification Difference of a Robust Surrogate Model. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1966. Springer, Singapore. https://doi.org/10.1007/978-981-99-8148-9_43

Download citation

DOI: https://doi.org/10.1007/978-981-99-8148-9_43
Published: 26 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8147-2
Online ISBN: 978-981-99-8148-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics