Elsevier

Neurocomputing

Volume 394, 21 June 2020, Pages 1-12
Neurocomputing

Generating adversarial examples with input significance indicator

https://doi.org/10.1016/j.neucom.2020.01.040Get rights and content

Abstract

Lots of adversarial attacks have been put forward to identify vulnerabilities in existing DNNs and improve their robustness. However, an indispensable benchmark, Jacobian-based Saliency Map Attack (JSMA), suffers from computational issues that prevent its application. In this paper, we propose to generate adversarial examples based on the input significance indicator, with two indicators -sensitivity and relevance- included to find the most sensitive or relevant element by backpropagating scores based on DNNs’ output. Our experiment results show that the sensitivity-based attack can alleviate the heavy computational problem of JSMA and still remain the effectiveness, while the relevance-based indicator can avoid the saturation problem of gradients with a slight sacrifice of runtime, thus reducing the magnitude of the perturbations with a higher rate of attack success.

Introduction

Deep learning has become the core of current machine learning and artificial intelligence. Thanks to its powerful learning, feature extraction and modeling capabilities, it has been widely applied to challenging areas, such as malware detection [5], speech synthesis [26], and semantic analysis [37]. In the area of computer vision, deep learning has become the main force for various applications such as self-driving cars [29], facial recognition [38] and image segmentation [36].

However, researchers have discovered that existing neural networks are vulnerable to attacks. Szegedy et al. first observed that despite their high accuracy, modern deep networks are surprisingly susceptible to adversarial examples in the form of small perturbations to images that remain almost imperceptible to the human eye [32]. The profound implications of the existence of adversarial examples triggered widespread interest in adversarial attacks and their defenses for deep learning in general [11], [17], [24], [30]. It is practical that we keep working on adversarial examples, as they are in a position to identify the vulnerability of deep learning models before their deployments in real-world applications. On the other hand, DNNs’ robustness could be improved by providing adversarial examples along with benign samples during the training stage [13], [25].

Since Szegedy et al. generated adversarial examples with the box-constrained L-BFGS method [32], lots of adversarial attack methods have been brought up for understanding the attack and improving the model’s defensibility. Goodfellow et al. asserted that the design traits of modern deep neural networks that encourage linear behavior for computational gains have the side-effect of making them susceptible to simple analytical perturbations, further proposing the fast gradient sign method (FGSM) based on this hypothesis [10]. Meanwhile, Kurakin et al. proposed the basic iterative method (BIM), the iterative version of single-step FGSM, taking multiple small step iterations while adjusting the direction after each step [13]. Madry et al. pointed out that BIM is equivalent to (the l version of) projected gradient descent (PGD), a standard convex optimization method [15]. Moosavi-Dezfooli et al. proposed DeepFool to find the closest distance from the original input to the decision boundary of adversarial examples [18]. Papernot et al. created an adversarial attack named Jacobian-based Saliency Map Attack (JSMA) by restricting the l0-norm of the perturbations [21]. Physically, it means that the goal is to modify only a few pixels in the image instead of perturbing the whole image to fool the classifier.

Whereas JSMA is widely accepted as one of the benchmarks of white-box attacks [1], [35], it has not been fully investigated because of its huge computational cost of constructing and updating the saliency map in each iteration. Calini et al. argued that JSMA is unable to run on ImageNet where images are 299 × 299 × 3 vectors, implying 236 work is required on each step of the calculation [4]. This high cost also makes the full comparison with other attacks or adversarial training evaluation difficult to implement [4], [7], [35].

We then find a modification of JSMA that brings its cost to a tolerable level but retains its effectiveness. We propose to use only the target partial derivatives of the output with respect to input features rather than the complete Jacobian matrix as done in JSMA. We calculate the sensitivity of each input element to the target classification to determine how much each element affects the final classification. The partial derivatives of the most sensitive elements are subsequently used to generate the adversarial example. Therefore, we refer to this method as Input Sensitivity-based Attack (ISA).

Besides the computational problem, Shrikumar et al. observed the saturation problem of gradients [27]: activation functions such as Rectified Linear Units (ReLUs) have a gradient of zero when they are not firing, and yet a ReLU that does not fire can still carry information. Similarly, sigmoid or tanh activations also have a near-zero gradient at high or low inputs even though such inputs can be very significant. When generating adversarial examples with gradients as an indicator, an insufficient or even unnecessary perturbation may occur.

To address the saturation problem observed in gradient methods, we view relevance as another input significance indicator whose calculation is activation-independent. A relevance score is assigned to every input feature indicating its contribution to the final prediction. We then propose an l0 attack named Input Relevance-based Attack (IRA) to search and perturb the most relevant feature according to the obtained relevance scores. Based on the fact that the l0 adversary introduces relatively recognizable perturbations and is limited to a specific application [34], we also design an iterative strategy that aims for distortions with l norm penalized.

We have evaluated the proposed attack under untargeted and targeted scenarios on popular datasets, including MNIST [14], CIFAR10 [12] and ImageNet [6], in comparison with several baseline adversarial attacks.

ISA is proved to largely reduce the heavy calculation of JSMA. Experiment results imply that our attack runs 5 ×  to 10 ×  faster than JSMA while being equally effective.

By providing a more accurate indicator, IRA can improve the attack performance with sacrifice of computational cost. On MNIST, IRA performs the best with highest success rate (96.5%), and least average distortions (23 pixels) when at most 60 pixel changes are permitted, compared to JSMA with 90% success rate and 38 changed pixels on average. Moreover, the l extension of IRA is able to introduce more imperceptible perturbations with higher success rate than the popular l attack, BIM. Both absolute distance and human perception metrics prove the effectiveness of the proposed attack.

This paper makes the following contributions:

  • In order to reduce the heavy calculation of JSMA, we use Input Sensitivity-based Attack (ISA) to achieve fast l0 adversary, experiment results compared to JSMA show that ISA can largely ensure the effectiveness with low computational complexity.

  • We adopt the Input Relevance-based Attack (IRA) to avoid saturation problem of gradients, both l0 and l attacks of IRA are proved better than baseline methods under various evaluation metrics.

  • We investigate the computational cost comparison of ISA and IRA, our theoretical and experimental analyses demonstrate that ISA does quicker operations, while IRA can achieve superior performance with slight sacrifice of runtime.

The remainder of this paper is arranged as follows. In Section 2, we provide the preliminaries of the threat model, as well as theories of a few attack methods. Section 3 introduces our attack based on input significance indicators, including input sensitivity-based attack (ISA) and input relevance-based attack (IRA), as well as the complexity comparison between them. Experimental results in untargeted and targeted scenarios are presented in Section 4. Finally, we provide some concluding remarks and suggestions for future work in Section 5.

Section snippets

Threat model

We discuss the threat model in the context of image classification tasks, focusing on attacks against models built with deep neural networks due to their great performance achieved.

Typically, a neural network of l layers is a function that accepts an input xRn and produces an output f(x)Rm, which is formed in a chain,f(x)=softmax(f(l1)(f(2)(f(1)(x)))).where f(k)(x)=σ((w(k))Ta(k)+b(k)), σ(.) is some activation function, w(k), a(k), b(k) represent the weight matrix, activation and bias of

Attack process

The proposed attack aims to find and perturb the most significant input feature according to an indicator. The indicator can be considered as a function of image x and the target t, returning a set of significance scores, each corresponding to an input element xi. Fig. 1 depicts the basic workflow using sensitivity and relevance as the indicators. As input, the attack takes a benign sample x, a target t, and a well-trained DNN f. It returns an adversarial sample x′ and proceeds in three basic

Experiment setup

We compare our l0 attacks with JSMA, including the input sensitivity-based attack (ISA) and the input relevance-based attack (IRA). The obtained significant feature will be set to be either −1 or +1 according to the sign of the indicator.

For JSMA, two forms of perturbations: increasing pixel intensity (θ=2) and decreasing pixel intensity (θ=2) are both utilized, and we refer to them as JSMA+ and JSMA-, respectively. θ is set twice as much as that in the original paper because the input values

Conclusion

We have presented in this paper white-box attacks based on input sensitivity (ISA) and relevance (IRA). Besides the one-shot modification to each pixel for the purpose of least features to be changed, we design l perturbation rule for IRA that introduces broad but imperceptible distortions. We also make them compared with state-of-the-art attacks through the evaluation of various metrics as well as time cost. Specifically, our l0 attacks are significantly faster than JSMA, reducing the runtime

CRediT authorship contribution statement

Xiaofeng Qiu: Conceptualization, Methodology, Resources, Investigation, Writing - review & editing, Supervision. Shuya Zhou: Software, Formal analysis, Investigation, Validation, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Xiaofeng Qiu is currently an associate professor at the school of Information and Communication engineering, Beijing University of Posts and Telecommunication. Her current research interest includes network security and data mining.

References (38)

  • J. Deng et al.

    Imagenet: a large-scale hierarchical image database

    2009 IEEE Conference on Computer Vision and Pattern Recognition

    (2009)
  • Y. Dong et al.

    Boosting adversarial attacks with momentum

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • I.J. Goodfellow et al.

    Explaining and harnessing adversarial examples

  • S. Gu et al.

    Towards deep neural network architectures robust to adversarial examples

  • A. Krizhevsky et al.

    Learning multiple layers of features from tiny images

    Technical Report

    (2009)
  • A. Kurakin et al.

    Adversarial examples in the physical world

    5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings

    (2017)
  • Y. LeCun, The mnist database of handwritten digits,...
  • A. Madry et al.

    Towards deep learning models resistant to adversarial attacks

    6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings

    (2018)
  • S.-M. Moosavi-Dezfooli et al.

    Universal adversarial perturbations

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • Cited by (6)

    Xiaofeng Qiu is currently an associate professor at the school of Information and Communication engineering, Beijing University of Posts and Telecommunication. Her current research interest includes network security and data mining.

    Shuya Zhou is a Master student at the school of Information and Communication engineering, Beijing University of Posts and Telecommunication. His current research interest covers adversarial attack and data mining.

    View full text