Abstract
Existing supervised learning detectors may deteriorate their performance when detecting unseen adversarial examples (AEs), because they may be sensitive with training samples. We found that (1) the CNN classifier is modest robust against AEs generated from other CNNs, and (2) such adversarial robustness is rarely affected by unseen instances. So, we construct an attack-agnostic detector based on an adversarial robust surrogate CNN to detect unknown AEs. Specifically, for a protected CNN classifier, we design a surrogate CNN classifier and predict the image with different classification labels on them as an AE. In order to detect transferable AEs and maintain low false positive rate, the surrogate model is distilled from the protected model, aiming at enhancing the adversarial robustness (i.e., suppress the transferability of AE) and meanwhile mimicking the output of clean image. To defend the potential ensemble attack targeted at our detector, we propose a new adversarial training scheme to enhance the security of the proposed detector. Experimental results of generalization ability tests on Cifar-10 and ImageNet-20 show that our method can detect unseen AEs effectively and performs much better than the state-of-the-arts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
20 categories: Tench, harvestman, dunlin, brambling, black grouse, sea lion, water ouzel, lorikeet, Afghan, bullfrog, black swan, anole, wolfhound, flatworm, pit bull terrier, alligator, fiddler crab, Sealyham, night snake, flamingo.
References
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Jia, X., Zhang, Y., Wu, B., Ma, K., Wang, J., Cao, X.: LAS-AT: adversarial training with learnable attack strategy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13398–13408 (2022)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)
Liang, B., Li, H., Su, M., Li, X., Shi, W., Wang, X.: Detecting adversarial image examples in deep neural networks with adaptive noise reduction. IEEE Trans. Depend. Secure Comput. 18(1), 72–85 (2018)
Hu, S., Yu, T., Guo, C., Chao, W.L., Weinberger, K.Q.: A new defense against adversarial images: Turning a weakness into a strength. Adv. Neural Inf. Process. Syst. 32 (2019)
Roth, K., Kilcher, Y., Hofmann, T.: The odds are odd: a statistical test for detecting adversarial examples. In: International Conference on Machine Learning, pp. 5498–5507. PMLR (2019)
Feinman, R., Curtin, R.R., Shintre, S., Gardner, A.B. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017)
Papernot, N., McDaniel, P.: Deep k-nearest neighbors: towards confident, interpretable, and robust deep learning. arXiv preprint arXiv:1803.04765 (2018)
Ma, X., et al.: Characterizing adversarial subspaces using local intrinsic dimensionality. arXiv preprint arXiv:1801.02613 (2018)
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Adv. Neural Inf. Process. Syst. 31 (2018)
Tian, J., Zhou, J., Li, Y., Duan, J.: Detecting adversarial examples from sensitivity inconsistency of spatial-transform domain. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 11, pp. 9877–9885 (2021)
Chen, K., et al.: Adversarial examples detection beyond image space. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3850–3854 (2021)
Fan, W., Sun, G., Su, Y., Liu, Z., Lu, X.: Integration of statistical detector and Gaussian noise injection detector for adversarial example detection in deep neural networks. Multimedia Tools Appl. 78, 20409–20429 (2019)
Liu, J., et al.: Detection based defense against adversarial examples from the steganalysis point of view. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4825–4834 (2019)
Peng, A., Deng, K., Zhang, J., Luo, S., Zeng, H., Yu, W.: Gradient-based adversarial image forensics. In: Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, 23–27 November 2020, Proceedings, Part II, vol. 27, pp. 417–428 (2020)
Guo, F., et al.: Detecting adversarial examples via prediction difference for deep neural networks. Inf. Sci. 501, 182–192 (2019)
Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2730–2739 (2019)
Waseda, F., Nishikawa, S., Le, T.N., Nguyen, H.H., Echizen, I.: Closer look at the transferability of adversarial examples: how they fool different models differently. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1360–1368 (2023)
Rony, J., Hafemann, L.G., Oliveira, L.S., Ayed, I.B., Sabourin, R., Granger, E.: Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4322–4330 (2019)
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9185–9193 (2018)
Dong, Y., Pang, T., Su, H., Zhu, J. Evading defenses to transferable adversarial examples by translation-invariant attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4312–4321 (2019)
Lin, J., Song, C., He, K., Wang, L., Hopcroft, J.E.: Nesterov accelerated gradient and scale invariance for adversarial attacks. arXiv preprint arXiv:1908.06281 (2019)
Yang, K., Zhou, T., Zhang, Y., Tian, X., Tao, D.: Class-disentanglement and applications in adversarial detection and defense. Adv. Neural Inf. Process. Syst. 34, 16051–16063 (2021)
Tramèr, F., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453 (2017)
Chen, P.Y., Sharma, Y., Zhang, H., Yi, J., Hsieh, C.J.: EAD: elastic-net attacks to deep neural networks via adversarial examples. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy, pp. 39–57 (2017)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Tan, M., Le, Q.: Efficient-net: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019)
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Artificial Intelligence Safety and Security, pp. 99–112 (2018)
Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: International Conference on Machine Learning, pp. 2206–2216. PMLR (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Peng, A., Deng, K., Zeng, H., Wu, K., Yu, W. (2024). Detecting Adversarial Examples via Classification Difference of a Robust Surrogate Model. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1966. Springer, Singapore. https://doi.org/10.1007/978-981-99-8148-9_43
Download citation
DOI: https://doi.org/10.1007/978-981-99-8148-9_43
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8147-2
Online ISBN: 978-981-99-8148-9
eBook Packages: Computer ScienceComputer Science (R0)