Generating adversarial examples with input significance indicator
Introduction
Deep learning has become the core of current machine learning and artificial intelligence. Thanks to its powerful learning, feature extraction and modeling capabilities, it has been widely applied to challenging areas, such as malware detection [5], speech synthesis [26], and semantic analysis [37]. In the area of computer vision, deep learning has become the main force for various applications such as self-driving cars [29], facial recognition [38] and image segmentation [36].
However, researchers have discovered that existing neural networks are vulnerable to attacks. Szegedy et al. first observed that despite their high accuracy, modern deep networks are surprisingly susceptible to adversarial examples in the form of small perturbations to images that remain almost imperceptible to the human eye [32]. The profound implications of the existence of adversarial examples triggered widespread interest in adversarial attacks and their defenses for deep learning in general [11], [17], [24], [30]. It is practical that we keep working on adversarial examples, as they are in a position to identify the vulnerability of deep learning models before their deployments in real-world applications. On the other hand, DNNs’ robustness could be improved by providing adversarial examples along with benign samples during the training stage [13], [25].
Since Szegedy et al. generated adversarial examples with the box-constrained L-BFGS method [32], lots of adversarial attack methods have been brought up for understanding the attack and improving the model’s defensibility. Goodfellow et al. asserted that the design traits of modern deep neural networks that encourage linear behavior for computational gains have the side-effect of making them susceptible to simple analytical perturbations, further proposing the fast gradient sign method (FGSM) based on this hypothesis [10]. Meanwhile, Kurakin et al. proposed the basic iterative method (BIM), the iterative version of single-step FGSM, taking multiple small step iterations while adjusting the direction after each step [13]. Madry et al. pointed out that BIM is equivalent to (the l∞ version of) projected gradient descent (PGD), a standard convex optimization method [15]. Moosavi-Dezfooli et al. proposed DeepFool to find the closest distance from the original input to the decision boundary of adversarial examples [18]. Papernot et al. created an adversarial attack named Jacobian-based Saliency Map Attack (JSMA) by restricting the l0-norm of the perturbations [21]. Physically, it means that the goal is to modify only a few pixels in the image instead of perturbing the whole image to fool the classifier.
Whereas JSMA is widely accepted as one of the benchmarks of white-box attacks [1], [35], it has not been fully investigated because of its huge computational cost of constructing and updating the saliency map in each iteration. Calini et al. argued that JSMA is unable to run on ImageNet where images are 299 × 299 × 3 vectors, implying 236 work is required on each step of the calculation [4]. This high cost also makes the full comparison with other attacks or adversarial training evaluation difficult to implement [4], [7], [35].
We then find a modification of JSMA that brings its cost to a tolerable level but retains its effectiveness. We propose to use only the target partial derivatives of the output with respect to input features rather than the complete Jacobian matrix as done in JSMA. We calculate the sensitivity of each input element to the target classification to determine how much each element affects the final classification. The partial derivatives of the most sensitive elements are subsequently used to generate the adversarial example. Therefore, we refer to this method as Input Sensitivity-based Attack (ISA).
Besides the computational problem, Shrikumar et al. observed the saturation problem of gradients [27]: activation functions such as Rectified Linear Units (ReLUs) have a gradient of zero when they are not firing, and yet a ReLU that does not fire can still carry information. Similarly, sigmoid or tanh activations also have a near-zero gradient at high or low inputs even though such inputs can be very significant. When generating adversarial examples with gradients as an indicator, an insufficient or even unnecessary perturbation may occur.
To address the saturation problem observed in gradient methods, we view relevance as another input significance indicator whose calculation is activation-independent. A relevance score is assigned to every input feature indicating its contribution to the final prediction. We then propose an l0 attack named Input Relevance-based Attack (IRA) to search and perturb the most relevant feature according to the obtained relevance scores. Based on the fact that the l0 adversary introduces relatively recognizable perturbations and is limited to a specific application [34], we also design an iterative strategy that aims for distortions with l∞ norm penalized.
We have evaluated the proposed attack under untargeted and targeted scenarios on popular datasets, including MNIST [14], CIFAR10 [12] and ImageNet [6], in comparison with several baseline adversarial attacks.
ISA is proved to largely reduce the heavy calculation of JSMA. Experiment results imply that our attack runs 5 × to 10 × faster than JSMA while being equally effective.
By providing a more accurate indicator, IRA can improve the attack performance with sacrifice of computational cost. On MNIST, IRA performs the best with highest success rate (96.5%), and least average distortions (23 pixels) when at most 60 pixel changes are permitted, compared to JSMA with 90% success rate and 38 changed pixels on average. Moreover, the l∞ extension of IRA is able to introduce more imperceptible perturbations with higher success rate than the popular l∞ attack, BIM. Both absolute distance and human perception metrics prove the effectiveness of the proposed attack.
This paper makes the following contributions:
- •
In order to reduce the heavy calculation of JSMA, we use Input Sensitivity-based Attack (ISA) to achieve fast l0 adversary, experiment results compared to JSMA show that ISA can largely ensure the effectiveness with low computational complexity.
- •
We adopt the Input Relevance-based Attack (IRA) to avoid saturation problem of gradients, both l0 and l∞ attacks of IRA are proved better than baseline methods under various evaluation metrics.
- •
We investigate the computational cost comparison of ISA and IRA, our theoretical and experimental analyses demonstrate that ISA does quicker operations, while IRA can achieve superior performance with slight sacrifice of runtime.
The remainder of this paper is arranged as follows. In Section 2, we provide the preliminaries of the threat model, as well as theories of a few attack methods. Section 3 introduces our attack based on input significance indicators, including input sensitivity-based attack (ISA) and input relevance-based attack (IRA), as well as the complexity comparison between them. Experimental results in untargeted and targeted scenarios are presented in Section 4. Finally, we provide some concluding remarks and suggestions for future work in Section 5.
Section snippets
Threat model
We discuss the threat model in the context of image classification tasks, focusing on attacks against models built with deep neural networks due to their great performance achieved.
Typically, a neural network of l layers is a function that accepts an input and produces an output , which is formed in a chain,where σ(.) is some activation function, w(k), a(k), b(k) represent the weight matrix, activation and bias of
Attack process
The proposed attack aims to find and perturb the most significant input feature according to an indicator. The indicator can be considered as a function of image x and the target t, returning a set of significance scores, each corresponding to an input element xi. Fig. 1 depicts the basic workflow using sensitivity and relevance as the indicators. As input, the attack takes a benign sample x, a target t, and a well-trained DNN f. It returns an adversarial sample x′ and proceeds in three basic
Experiment setup
We compare our l0 attacks with JSMA, including the input sensitivity-based attack (ISA) and the input relevance-based attack (IRA). The obtained significant feature will be set to be either −1 or +1 according to the sign of the indicator.
For JSMA, two forms of perturbations: increasing pixel intensity () and decreasing pixel intensity () are both utilized, and we refer to them as JSMA+ and JSMA-, respectively. θ is set twice as much as that in the original paper because the input values
Conclusion
We have presented in this paper white-box attacks based on input sensitivity (ISA) and relevance (IRA). Besides the one-shot modification to each pixel for the purpose of least features to be changed, we design l∞ perturbation rule for IRA that introduces broad but imperceptible distortions. We also make them compared with state-of-the-art attacks through the evaluation of various metrics as well as time cost. Specifically, our l0 attacks are significantly faster than JSMA, reducing the runtime
CRediT authorship contribution statement
Xiaofeng Qiu: Conceptualization, Methodology, Resources, Investigation, Writing - review & editing, Supervision. Shuya Zhou: Software, Formal analysis, Investigation, Validation, Writing - original draft, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Xiaofeng Qiu is currently an associate professor at the school of Information and Communication engineering, Beijing University of Posts and Telecommunication. Her current research interest includes network security and data mining.
References (38)
- et al.
Generate adversarial examples by spatially perturbing on the meaningful area
Pattern Recognit. Lett.
(2019) - et al.
Review and comparison of methods to study the contribution of variables in artificial neural network models
Ecol. Modell.
(2003) - et al.
Explaining nonlinear classification decisions with deep taylor decomposition
Pattern Recognit.
(2017) - et al.
Understanding adversarial training: increasing local stability of supervised models through robust optimization
Neurocomputing
(2018) - et al.
Facial expression recognition via learning deep sparse autoencoders
Neurocomputing
(2018) - et al.
Threat of adversarial attacks on deep learning in computer vision: a survey
IEEE Access
(2018) - et al.
On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation
PloS One
(2015) - et al.
How to explain individual classification decisions
J. Mach. Learn. Res.
(2010) - et al.
Towards evaluating the robustness of neural networks
2017 IEEE Symposium on Security and Privacy (SP)
(2017) - et al.
Detection of malicious code variants based on deep learning
IEEE Trans. Ind. Inform.
(2018)
Imagenet: a large-scale hierarchical image database
2009 IEEE Conference on Computer Vision and Pattern Recognition
Boosting adversarial attacks with momentum
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Explaining and harnessing adversarial examples
Towards deep neural network architectures robust to adversarial examples
Learning multiple layers of features from tiny images
Technical Report
Adversarial examples in the physical world
5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings
Towards deep learning models resistant to adversarial attacks
6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings
Universal adversarial perturbations
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Cited by (6)
Adversarial patch attacks against aerial imagery object detectors
2023, NeurocomputingA Methodological Review on Development of Crack Healing Technologies of Asphalt Pavement
2023, Sustainability (Switzerland)Seek-and-Hide: Adversarial Steganography via Deep Reinforcement Learning
2022, IEEE Transactions on Pattern Analysis and Machine IntelligenceAdversarial Deep Learning Attacks—A Review
2021, Lecture Notes in Networks and Systems
Xiaofeng Qiu is currently an associate professor at the school of Information and Communication engineering, Beijing University of Posts and Telecommunication. Her current research interest includes network security and data mining.
Shuya Zhou is a Master student at the school of Information and Communication engineering, Beijing University of Posts and Telecommunication. His current research interest covers adversarial attack and data mining.