Abstract
Neural networks (NNs) are known to be susceptible to adversarial examples (AEs), which are intentionally designed to deceive a target classifier by adding small perturbations to the inputs. And interestingly, AEs crafted for one NN can mislead another model. Such a property is referred to as transferability, which is often leveraged to perform attacks in black-box settings. To mitigate the transferability of AEs, many approaches are explored to enhance the NN’s robustness. Especially, adversarial training (AT) and its variants are shown be the strongest defense to resist such transferable AEs. To boost the transferability of AEs against the robust models that have undergone AT, a novel AE generating method is proposed in this paper. The motivation of our method is based on the observation that robust models with AT is more sensitive to the perceptually-relevant gradients, hence it is reasonable to synthesize the AEs by the perturbations that have the perceptually-aligned features. The detailed process of the proposed method is given as below. First, by optimizing the loss function over an ensemble of random noised inputs, we obtain perceptually-aligned perturbations that have the noise-invariant property. Second, we employ Perona–Malik (P–M) filter to smooth the derived adversarial perturbations, such that the perceptually-relevant feature of the perturbation is significantly reinforced and the local oscillation of the perturbation is substantially suppressed. Our method can be generally applied to any gradient-based attack method. We carry out extensive experiments under ImageNet dataset for various robust and non-robust models, and the experimental results demonstrate the effectiveness of our method. Particularly, by combining our method with diverse inputs method and momentum iterative fast gradient sign method, we can achieve state-of-the-art performance in terms of fooling the robust models.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Athalye A, Carlini N, Wagner D (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv:1802.00420
Balduzzi D, Frean M, Leary L, Lewis JP, Ma KW, Mcwilliams B (2017) The shattered gradients problem: if resnets are the answer, then what is the question?. Neural and evolutionary computing. arXiv:1702.08591
Brendel W, Rauber J, Bethge M (2017) Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. Machine learning. arXiv:1712.04248
Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. In: 2017 ieee symposium on security and privacy (sp). IEEE, pp 39–57
Chan A, Tay Y, Ong YS, Fu J (2019) Jacobian adversarially regularized networks for robustness
Chen J, Su M, Shen S, Xiong H, Zheng H (2019) POBA-GA: perturbation optimized black-box adversarial attacks via genetic algorithm. Comput Secur 85:89–106
Chen PY, Zhang H, Yi J, Hsieh CJ (2017) Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models, pp 15–26. https://doi.org/10.1145/3128572.3140448
Dhillon GS, Azizzadenesheli K, Lipton ZC et al (2018) Stochastic activation pruning for robust adversarial defense. arXiv preprint arXiv:1803.01442
Dong Y, Liao F, Pang T et al (2018) Boosting adversarial attacks with momentum. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9185–9193
Dong Y, Pang T, Su H et al (2019) Evading defenses to transferable adversarial examples by translation-invariant attacks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4312–4321
Feng W, Chen Z, Gursoy MC, Velipasalar S (2020) Defense strategies against adversarial jamming attacks via deep reinforcement learning. In: 2020 54th annual conference on information sciences and systems (CISS)
Goodfellow I, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. Machine learning. arXiv
Grosse K, Manoharan P, Papernot N, Backes M, Mcdaniel P (2017) On the (statistical) detection of adversarial examples. Cryptography and security. arXiv
Gu X, Angelov PP, Soares EA (2020) A self-adaptive synthetic over-sampling technique for imbalanced classification. Int J Intell Syst 35(6):923–943
Guo C, Rana M, Cisse M, Der Maaten LV (2017) Countering adversarial images using input transformations. Computer vision and pattern recognition. arXiv:1711.00117
He K, Zhang X, Ren S et al (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, Cham, pp 630–645
Huan Z, Wang Y, Zhang X et al (2020) Data-free adversarial perturbations for practical black-box attack. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 127–138
Iyyer M, Wieting J, Gimpel K, Zettlemoyer L (2018) Adversarial example generation with syntactically controlled paraphrase networks, vol 1, pp 1875–1885. arXiv:1804.06059
Kurakin A, Goodfellow I, Bengio S (2016) Adversarial examples in the physical world. Computer vision and pattern recognition. arXiv:1607.02533
Li J, Kuang X, Lin S, Ma X, Tang Y (2020) Privacy preservation for machine learning training and classification based on homomorphic encryption schemes. Inf Sci 526:166–179. https://doi.org/10.1016/j.ins.2020.03.041
Li X, Li F (2017) Adversarial examples detection in deep networks with convolutional filter statistics. In: Proceedings of the IEEE international conference on computer vision, pp 5764–5772
Li Y, Li L, Wang L et al (2019) Nattack: learning the distributions of adversarial examples for an improved black-box attack on deep neural networks. arXiv preprint arXiv:1905.00441
Liu X, Cheng M, Zhang H et al (2018) Towards robust neural networks via random self-ensemble. In: Proceedings of the european conference on computer vision (ECCV), pp 369–385
Liu Y, Chen X, Liu C, Song D (2016) Delving into transferable adversarial examples and black-box attacks. Learning. arXiv:1611.02770
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2017) Towards deep learning models resistant to adversarial attacks. Machine learning. arXiv:1706.06083
Meng L, Lin C T, Jung T P et al (2019) White-box target attack for EEG-based BCI regression problems. In: International conference on neural information processing. Springer, Cham, pp 476–488
Metzen JH, Genewein T, Fischer V, Bischoff B (2017) On detecting adversarial perturbations. arXiv:1702.04267
Moosavidezfooli S, Fawzi A, Fawzi O, Frossard P, Soatto S (2017) Analysis of universal adversarial perturbations. Computer vision and pattern recognition. arXiv:1705.09554
Mopuri KR, Garg U, Babu RV (2017) Fast feature fool: a data independent approach to universal adversarial perturbations. Computer vision and pattern recognition. arXiv:1707.05572
Nazemi A, Fieguth P (2019) Potential adversarial samples for white-box attacks. arXiv:1912.06409
Papernot N, Mcdaniel P, Goodfellow I (2016) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. Cryptography and security. arXiv:1605.0727
Papernot N, McDaniel P, Wu X et al (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE symposium on security and privacy (SP). IEEE, pp 582–597
He X, Yan S, Hu Y et al (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340
Smilkov D, Thorat N, Kim B, Viegas FB, Wattenberg M (2017) Smoothgrad: removing noise by adding noise. Learning. arXiv:1706.03825
Sun L, Wang J, Yu PS, Li B (2018) Adversarial attack and defense on graph data: a survey. Cryptography and security. arXiv:1812.10528
Sutanto RE, Lee S (2020) Adversarial attack defense based on the deep image prior network. In: Information science and applications. Springer, Singapore, pp 519–526
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2016) Inception-v4, inception-resnet and the impact of residual connections on learning, pp 4278–4284. arXiv:1602.07261
Szegedy C, Vanhoucke V, Ioffe S et al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Tang P, Wang C, Wenyu X, Wenjun L, Jingdong Z (2019) Object detection in videos by high quality object linking. IEEE Trans Pattern Anal Mach Intel 42(5):1272–1278
Tramer F, Kurakin A, Papernot N, Goodfellow I, Boneh D, Mcdaniel P (2017) Ensemble adversarial training: attacks and defenses. Machine learning. arXiv:1705.07204
Wang X, Kuang X, Li J et al (2020) Oblivious transfer for privacy-preserving in VANET's feature matching. IEEE transactions on intelligent transportation systems
Wang X, Li J, Kuang X, Tan Y, Li J (2019) The security of machine learning in an adversarial setting: a survey. J Parallel Distrib Comput 130:12–23
Weickert J, Romeny BTH, Viergever M et al (1998) Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Trans Image Process 7(3):398–410
Wu L, Zhu Z, Tai C, Weinan E (2018) Understanding and enhancing the transferability of adversarial examples. Machine learning. arXiv:1802.09707
Xie C, Zhang Z, Zhou Y et al (2019) Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2730–2739
Zhang F, Chen Y, Li Z, Hong Z, Liu J, Ma F, Han J, Ding E (2019) Acfnet: attentional class feature network for semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP)
Zhang Y, Liang P (2019) Defending against whitebox adversarial attacks via randomized discretization. arXiv:1903.10586
Zhao Q, Zhao C, Cui S et al (2020) PrivateDL: privacy‐preserving collaborative deep learning against leakage from gradient sharing. Int J Intell Syst
Acknowledgements
This work was supported by National Natural Science Foundation of China (Nos. 62072127, 62002076), National Natural Science Foundation for Outstanding Youth Foundation (No. 61722203), Project 6142111180404 supported by CNKLSTISS, Science and Technology Program of Guangzhou, China (No. 202002030131), Guangdong basic and applied basic research fund joint fund Youth Fund (No. 2019A1515110213), Educational Commission of Guangdong Provice of China (2016KZDXM036).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, H., Lu, K., Wang, X. et al. Generating transferable adversarial examples based on perceptually-aligned perturbation. Int. J. Mach. Learn. & Cyber. 12, 3295–3307 (2021). https://doi.org/10.1007/s13042-020-01240-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01240-1