EnsembleFool: A method to generate adversarial examples based on model fusion strategy

doi:10.1016/j.cose.2021.102317

Computers & Security

Volume 107, August 2021, 102317

https://doi.org/10.1016/j.cose.2021.102317 Get rights and content

Abstract

Deep neural networks have been shown vulnerable to adversarial attacks launched by adversarial examples. These examples’ transferability makes an attack in the real-world feasible, which poses a security threat to deep learning. Considering the limited representation capacity of a single deep model, the transferability of an adversarial example generated by a single attack model would cause the failure of attacking other different models. In this paper, we propose a new adversarial attack method, named EnsembleFool, which flexibly integrates multiple models to enhance adversarial examples’ transferability. Specifically, the model confidence concerning an input example reveals the risk of a successful attack. In an iterative attacking case, the result of a previous attack could guide us to enforce a new attack that possesses a higher probability of success. Regarding this, we design a series of integration strategies to improve the adversarial examples in each iteration. Extensive experiments on ImageNet indicate that the proposed method has superior attack performance and transferability than state-of-the-art methods.

Introduction

Deep neural networks enable intelligent systems to exhibit high fidelity on interested tasks such as computer vision, natural language processing, speech recognition, and recommendation, considerably lowering the human labours in real applications (Kurakin et al., 2017). However, the intention to defraud these models makes the systems to be possibly unsafe. Especially, the adversarial examples (Akhtar, Mian, 2018, Szegedy, Zaremba, Sutskever, Bruna, Erhan, Goodfellow, Fergus, 2014), generated by adding undetectable perturbations into the original input, have the ability of misleading the deep models and resulting in incorrect predictions(Déniz, Vállez, García, 2019, Pedraza, Déniz, Bueno, 2020, Zhao, Liu, Larson, 2020). As Fig. 1 illustrates, a well-trained model predicts the unperturbed input as the true label with 95.62% confidence, whereas giving a wrong label with 87.56% confidence when the image is corrupted by well-designed noise. The deep models’ vulnerability remains a critical safety issue and it is necessary to evaluate the model robustness with respect to adversarial attacks. Hence, advanced attack techniques are expected to cooperate with the adversarial defense community to improve the safety of intelligent systems.

The existing adversarial attack methods can be roughly divided into two categories: (1) the white-box attack (Carlini, Wagner, 2017, Chen, Zheng, Chen, Xiong, 2020, Goodfellow, Shlens, Szegedy, 2015, Kurakin, Goodfellow, Bengio, 2017) and (2) the black-box attack (Brendel, Rauber, Bethge, 2018, Chen, Su, Shen, Xiong, Zheng, 2019, Chen, Zhang, Sharma, Yi, Hsieh, 2017, Ren, Zhou, Wang, Wu, Wu, Choo, 2020, Zhao, Dua, Singh, 2018). The white-box attack depicts a problem that the attacker has the knowledge of the structure and parameters of the to-be-attacked model, which could generate extremely deceptive adversarial examples during attacking. For instance, Goodfellow et al. (2015) proposed the fast gradient sign method (FGSM) that can effectively calculate the perturbation by adjusting the model’s gradients. Unlike the white-box attack, the black-box attack only allows the model outputs in terms of the adversarial inputs to be accessible, while the model details are absent. This poses a more difficult problem than the white-box attack because of the model’s limited on-hand information. To handle this problem, recent researches indicate that adversarial examples have transferability (Liu et al., 2017), which is saying that the examples crafted for a given model can fool other unknown models. This property inspires several state-of-the-art black-box attacks (Brendel, Rauber, Bethge, 2018, Chen, Zhang, Sharma, Yi, Hsieh, 2017, Zhao, Dua, Singh, 2018). For example, MI-FGSM (Dong et al., 2018) applies an iterative and momentum technique into FGSM to generate improved adversarial examples, achieving enhanced attack ability in both white-box and black-box cases.

The transferability of adversarial examples endows the methods mentioned above with the ability to attack pure deep models. Unfortunately, when the models are equipped with certain defense mechanisms, the above methods exhibit low efficacy of fooling the black-box models. For example, adversarial training (Song, He, Lin, Wang, Hopcroft, 2020, Tramèr, Kurakin, Papernot, Goodfellow, Boneh, McDaniel, 2018) and input modification (Cohen, Rosenfeld, Kolter, 2019, Liu, Liu, Liu, Xu, Lin, Wang, Wen, 2019) are two typical ways to enhance the robustness of deep models in the black-box case. This implies an imparity issue between different models, suggesting that it is not implementable to create a universal attack with a single model. Instead, the characteristics of multiple models could be simultaneously considered when synthesizing the input perturbation such that the generated example could be a threat to all the models.

To further investigate the model diversity, we visualize the attention maps of different deep models with varying architectures through CAM (Zhou et al., 2016). Fig. 2 shows that the main focuses of different models during prediction are located in different spatial regions, where we illustrate using a heatmap mask and the focus difference are marked using red boxes. This verifies that different models would have resistance to attacks in different regions of the input image and more importantly, the attack result of one model could be beneficial to attack another model. Hence, the input perturbation could be improved by involving the expressed robustness of multiple models.

Inspired by the above analyses, in this paper, we propose a novel attack method, named as EnsembleFool. Specifically, considering the attack map of a single model is limited and different models possess different maps, we follow the MI-FGSM framework, which integrates multiple models to enlarge the attack map such that the attacking capability is enhanced. To implement a flexible integration process, we develop a series of fusion strategies based on the attacking results in the previous iteration. This is motivated by the attack effect of an adversarial example that can be clearly expressed by the output of the model, and hence, the output could guide the attacking in the next iteration. In such a way, the adversarial examples crafted by the ensemble of multiple models exhibit enhanced attacking ability in both white-box and black-box cases. The main contributions of this paper are summarized as follows:

•
We investigate the transferability of the adversarial examples crafted by a single model, showing that the primary focuses of different models are located in different receptive fields and the attack map could not be shared between different models. This validates the necessity of integrating multiple models in a flexible way.
•
Based on MI-FGSM, we are well motivated to develop a novel ensemble attack method called EnsembleFool, which produces the adversarial examples by integrating the attack results of multiple models in an adaptive way such that the most informative attack could be dominant in each attack iteration.
•
Extensive experiments validate the state-of-the-art performance of the proposed method even in the cases of attack models with defense mechanisms.

The remainder of this paper is organized as follows. The related works and the preliminary are briefly reviewed in Section 2 and Section 3, respectively. The proposed method is introduced in Section 4.1. The experiments are presented in Section 5, with the conclusion drawn in Section 6.

Section snippets

Related work

In this section, we briefly review the relevant methods to the current work, including adversarial attacking, attacking using ensemble of models, and adversarial defence. Adversarial attacking. Deep neural networks have been shown vulnerable to adversarial examples (Szegedy et al., 2014), posing the requirement of developing advanced attack methods to evaluate the robustness of the models. For example, Goodfellow et al. (2015) argued that the primary reason of the vulnerability was the linear

Preliminary

Before presenting the proposed method, we first introduce the notations and the preliminary knowledge about the current work. Let $x$ denote an image, $y^{t r u e}$ denote the corresponding ground-truth label, $θ$ denote the network parameters, and $L (x, y^{t r u e}; θ)$ denote the loss function. To generate the adversarial example, our goal is to maximize the loss function $L (x, y^{t r u e}; θ)$ given $x$ under the constraint that the generated example $x^{a d v} = x + r$ should look visually similar to the original image $x$ and the

Methods

In this section, we introduce the proposed adaptive ensemble model in detail, with the variations of manual and dynamical adaptivity.

Experiments

In this section, we conduct a series of experiments on public datasets to validate the effectiveness of the proposed method by comparing it with the state-of-the-arts.

Conclusions

Different deep models exhibit different characteristics of vulnerability when being attacked. Exploring this property would help to generate more powerful adversarial examples. Starting from this, in this paper, we propose a novel adaptive ensemble-based adversarial attack method that employs an adaptive strategy to fuse the information of multiple models. Specifically, the model prediction reveals whether correct or confident the model is with respect to the input. This fact informs us to use

Declaration of Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Wenyu Peng: Conceptualization, Methodology. Renyang Liu: Writing - original draft. Ruxin Wang: Writing - review & editing. Taining Cheng: Data curation. Zifeng Wu: Validation. Li Cai: Visualization, Investigation. Wei Zhou: Supervision.

Declaration of Competing Interest

No.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61762089, Grant 61663047, Grant 61863036, Grant 61762092, Grant 71972165, and Grant 61763048, the Yunnan Applied Basic Research Projects under Grant No. 202001BB050034, and Yunnan Province Science Foundation for Youths under Grant No.202005AC160007.

Wenyu Peng received the BE degree in network engineering from Yunnan University in 2019. She is a master degree candidate at the School of Software, Yunnan University. Her research interests include deep learning, adversarial attack and bio-informatics.

References (34)

J. Chen et al.
POBA-GA: perturbation optimized black-box adversarial attacks via genetic algorithm
Comput. Secur.
(2019)
J. Chen et al.
RCA-SOC: a novel adversarial defense by refocusing on critical areas and strengthening object contours
Comput. Secur.
(2020)
Y. Ren et al.
Query-efficient label-only attacks against black-box machine learning models
Comput. Secur.
(2020)
N. Akhtar et al.
Threat of adversarial attacks on deep learning in computer vision: a survey
IEEE Access
(2018)
A. Athalye et al.
Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples
ICML
(2018)
W. Brendel et al.
Decision-based adversarial attacks: Reliable attacks against black-box machine learning models
ICLR
(2018)
N. Carlini et al.
Towards evaluating the robustness of neural networks
SP
(2017)
P. Chen et al.
ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security
(2017)
J.M. Cohen et al.
Certified adversarial robustness via randomized smoothing
ICML
(2019)
O. Déniz et al.
Adversarial examples are a manifestation of the fitting-generalization trade-off
IWANN
(2019)

O. Déniz-Suárez et al.

Robustness to adversarial examples can be improved with overfitting

Int. J. Mach. Learn. Cybern.

(2020)

Y. Dong et al.

Boosting adversarial attacks with momentum

CVPR

(2018)

Y. Dong et al.

Evading defenses to transferable adversarial examples by translation-invariant attacks

CVPR

(2019)

I.J. Goodfellow et al.

Explaining and harnessing adversarial examples

ICRL

(2015)

C. Guo et al.

Countering adversarial images using input transformations

ICLR

(2018)

A. Kurakin et al.

Adversarial examples in the physical world

ICLR

(2017)

F. Liao et al.

Defense against adversarial attacks using high-level representation guided denoiser

CVPR

(2018)

Cited by (0)

Renyang Liu received the BE degree in Computer Science from the Northwest normal University in 2017, He is a current Ph.D. candidate in School of Information Science and Engineering at Yunnan University, Kunming, China. His current research interest includes deep learning, adversarial attack and graph learning.

Ruxin Wang is currently an associate professor with the National Pilot School of Software, Yunnan University, Kunming, China. He received his BEng from Xidian University, his MSc from Huazhong University of Science and Technology, and his PhD degree from the University of Technology Sydney. His research interests include image restoration, deep learning, and computer vision. He focuses on the topic of image synthesis by using both discriminative models and generative models. He has authored and co-authored 20+ research papers including IEEE T-NNLS, T-IP, T-Cyb, ICCV, and AAAI. He has received ”the 1000 Talents Plan for Young Talents of Yunnan Province” award.

Taining Cheng received the bachelor’s degree from Yunnan University, Yunnan, China, in 2019. He is currently working toward the Masters degree in School of Software at Yunnan University. His current research interest includes learned index and distributed computing

Zifeng Wu received the BE degree in network engineering from Yunnan University in 2019. He is a master degree candidate at the School of Software, Yunnan University. His current research interests include machine learning, model compression and acceleration.

Li Cai was born in Kunming, China, in 1975. She received the M.S. degree in computer application from Yunnan University, China, in 2007. Then, she received the Ph.D. degree with the School of Computer Science, Fudan University, China, in 2020. From 1997 to 2002, she was a Research Assistant with Network Center. Since 2010, she has been an Associate Professor with the School of Software, Yunnan University. Her research interests include intelligent transportation, machine learning, visualization, and data quality.

Wei Zhou received the Ph.D. degree from the University of Chinese Academy of Sciences. He is currently a Full Professor with the Software School, Yunnan University. His current research interests include the distributed data intensive computing and bio-informatics. He is currently a Fellow of the China Communications Society, a member of the Yunnan Communications Institute, and a member of the Bioinformatics Group of the Chinese Computer Society. He won the Wu Daguan Outstanding Teacher Award of Yunnan University in 2016, and was selected into the Youth Talent Program of Yunnan University in 2017. Hosted a number of National Natural Science Foundation projects.

View full text

EnsembleFool: A method to generate adversarial examples based on model fusion strategy

Abstract

Introduction

Section snippets

Related work

Preliminary

Methods

Experiments

Conclusions

Declaration of Competing Interests

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Comput. Secur.

Comput. Secur.

Comput. Secur.

Threat of adversarial attacks on deep learning in computer vision: a survey

IEEE Access

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

ICML

Decision-based adversarial attacks: Reliable attacks against black-box machine learning models

ICLR

Towards evaluating the robustness of neural networks

SP

ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Certified adversarial robustness via randomized smoothing

ICML

Adversarial examples are a manifestation of the fitting-generalization trade-off

IWANN

Robustness to adversarial examples can be improved with overfitting

Int. J. Mach. Learn. Cybern.

Boosting adversarial attacks with momentum

CVPR

Evading defenses to transferable adversarial examples by translation-invariant attacks

CVPR

Explaining and harnessing adversarial examples

ICRL

Countering adversarial images using input transformations

ICLR

Adversarial examples in the physical world

ICLR

Defense against adversarial attacks using high-level representation guided denoiser

CVPR