Elsevier

Neurocomputing

Volume 500, 21 August 2022, Pages 135-142
Neurocomputing

Leveraging transferability and improved beam search in textual adversarial attacks

https://doi.org/10.1016/j.neucom.2022.05.054Get rights and content

Abstract

Adversarial attacks in NLP are difficult to ward off because of the discrete and highly abstract nature of human languages. Prior works utilize different word replacement strategies to generate semantic-preserving adversarial texts. These query-based methods, however, have limited exploration of the search space. To fully explore the search space, an improved beam search with multiple random perturbing positions is used. Besides, we use the transferable vulnerability from surrogate models to choose vulnerable candidate words for target models. We empirically show that beam search with multiple random attacking positions works better than the commonly used greedy search with word importance ranking. Extensive experiments on three popular datasets demonstrate that our method can outperform three advanced attacking methods under black-box settings. We provide ablation studies to clearly show the effectiveness of our improved beam search which can achieve a higher success rate than the greedy approach under the same query budget.

Introduction

Deep neural networks (DNNs) have achieved results that exceed human levels in many tasks in the natural language processing (NLP) field [1], [2], [3], but the predictions of models used in these tasks cannot be explained in a way that humans can understand, revealing these models’ lack of interpretability [4]. In connection with this, recent studies have shown that DNNs can be easily deceived to make wrong decisions, by crafted adversarial examples [5], [6], which is one of the major vulnerabilities of DNNs.

In this paper, we focus on adversarial attacks under black-box settings—i.e., any access to the model is limited to the input and output. The study of black-box attacks is more challenging and realistic [7] than non-black-box ones.

In the NLP field, existing methods of black-box adversarial attacks can be roughly divided into two categories: 1) Query-based attacks find vulnerable tokens by querying the output decisions and scores of the target model, and then apply different strategies to these tokens to generate adversarial texts; 2) transfer-based attacks utilize a surrogate model to approximate the decision boundary of the target model and perform gradient-based attacks on the surrogate model. Since adversarial examples are proven to be transferable [8], [9], [10], these adversarial examples can also fool the target model.

These existing methods however have limitations. On the one hand, query-based attacks require a large number of queries for the target model. Besides, unlike continuous data, the discrete nature of tokens makes it practically impossible to apply optimization algorithms to the search for vulnerabilities. Thus the vulnerable tokens can only be found by exhaustive search [11]. On the other hand, transfer-based attacks are subject to the transferability of the generated adversarial examples and could maintain only a low success rate.

There are also difficulties in textual adversarial attacks themselves. First, note that text data is discrete. Usually, we vectorize the tokens as the input of DNNs. Applying gradient-based adversarial attacks on these vectors may result in invalid and out of vocabulary tokens in the generated adversarial examples [12], [13], [14]. Second, the change of just one character or word may cause semantic changes and grammatical errors, which can be easily detected.

To tackle the above problems, we combine the advantages of these two methods. By applying the query-based search method on the word embeddings of the surrogate model, we use the combination of target model’s prediction scores and semantic similarity as the objective function to efficiently generate textual adversarial examples.

Our contributions are summarized as follows:

  • (1)

    We propose a black-box adversarial attack method that leverages an improved beam search and transferability from surrogate models, which can efficiently generate semantic-preserved adversarial texts.

  • (2)

    We evaluate our method on three popular datasets and four neural networks. Our method outperforms three advanced methods in automatic evaluation.

  • (3)

    We provide extensive ablation studies on the substitute strategy, revealing that beam search with multiple random attacking positions works better than the commonly used greedy search with word importance ranking.

  • (4)

    We make a thorough evaluation of the trade-off between query budgets and attack performance and provide an in-depth analysis of our method.

Section snippets

Textual Adversarial Examples

In this paper, we discuss adversarial examples created by deliberately adding minor perturbations to the clean examples. Formally, for a trained classifier F, a clean sample X, the ground truth label Ytrue, and the maximum allowed perturbation , the adversarial example Xadv can be described asF(Xadv)Ytrue,X-Xadv,which is a non-targeted attack. Moreover, when the output of F is misled to be a specified label c, i.e., F(Xadv)=c, it is called a targeted attack.

Clean sample X is composed of a

Methodology

In this section we explain each part of our algorithm in detail. We show our proposed attack in Algorithm 1.

Algorithm 1: Textual Adversarial Attack
Input: The original text X, ground truth label Ytrue, classifier F, word count n of X, max modification percentage m, semantic similarity function Sim(·), similarity threshold θUSE, beam size b
Output: The adversarial text Xadv
1: Initialization: XadvX,Adv_list // python list
2: Append Xadv in Adv_list
3: for i = 1 to n·m do
4:  for each sentence sj in Adv_

Datasets

IMDB [34] is a document-level sentiment analysis dataset containing 50,000 non-professional movie reviews. It is widely used in the field of text classification.

SST-2 [35] is a sentence-level sentiment analysis dataset. The movie reviews are given by professionals.

MultiNLI [36] is a challenging dataset in the natural language inference (NLI) field with high semantic complexity. It includes eval data matched and mismatched with training documents (MNLI m and MNLI mm).

Target Models

WordCNN consists of an

Results

We conduct sufficient experiments on two tasks: text classification and natural language inference. By using all the above attacking methods to generate 1000 adversarial examples respectively, we evaluate their performance based on the three automatic metrics of robust accuracy, perturbation rate, and semantic similarity.

Ablation Study

To illustrate the superiority of our proposed method, we conduct ablation experiments. The substitution strategy is divided into three parts: 1) replacement ordering; 2) candidates construction; and 3) search method to explore the impact of different combinations on attack effect. For replacement ordering, word importance ranking and random order are compared. For candidates construction, constraints like the number of synonyms and the semantic similarity threshold are set consistent.

In

Conclusion and Future Work

In this paper, we utilize an improved beam search and transferability from surrogate models to perform textual adversarial attacks under the black-box setting. Experimental results illustrate that our proposed method achieves a high attack success rate while keeping intact the original semantics. However, some candidates generated from the word embeddings are not comprehensive enough, and were omitted. Therefore utilizing large scale language models to generate more semantic-preserving

CRediT authorship contribution statement

Bin Zhu: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing - original draft. Zhaoquan Gu: Data curation, Writing - original draft, Funding acquisition. Yaguan Qian: Formal analysis, Investigation. Francis Lau: Writing - review & editing. Zhihong Tian: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We thank the anonymous reviewers for their very helpful comments which helped improve the presentation of this paper. This work is supported in part by the National Key R&D Program of China (No. 2019YFB1706003), the Major Key Project of PCL (No. PCL2022A03), the Key Program of Zhejiang Provincial Natural Science Foundation of China (No. LZ22F020007), the Natural Science Foundation of China (No. 61902082), the Guangdong Province Key R&D Program of China (No. 2019B010136003), and the Guangdong

Bin Zhu received his bachelor degree in Software Engineering from Southwest Jiaotong University (2019). He is currently earning his master’s degree in the Cyberspace Institute of Advanced Technology (CIAT), Guangzhou University, China. His research includes Natural Language Processing and Adversarial Robustness.

References (38)

  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you...
  • M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer, Deep contextualized word...
  • J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language...
  • H. Liu et al.

    Towards explainable NLP: A generative explanation framework for text classification

  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, I. Erhan, Dumitru; Goodfellow, R. Fergus, Intriguing properties of...
  • I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, arXiv preprint...
  • J. Gao, J. Lanchantin, M.L. Soffa, Y. Qi, Black-box generation of adversarial text sequences to evade deep learning...
  • N. Papernot, P.D. McDaniel, I.J. Goodfellow, S. Jha, Z.B. Celik, A. Swami, Practical black-box attacks against machine...
  • Y. Liu, X. Chen, C. Liu, D. Song, Delving into transferable adversarial examples and black-box attacks, CoRR...
  • A. Kurakin, I.J. Goodfellow, S. Bengio, Adversarial examples in the physical world, CoRR abs/1607.02533....
  • J. Li, W. Monroe, D. Jurafsky, Understanding neural networks through representation erasure, CoRR abs/1612.08220....
  • N. Papernot, P.D. McDaniel, A. Swami, R.E. Harang, Crafting adversarial input sequences for recurrent neural networks,...
  • S. Samanta, S. Mehta, Towards crafting text adversarial samples, CoRR abs/1707.02812....
  • J. Li, S. Ji, T. Du, B. Li, T. Wang, Textbugger: Generating adversarial text against real-world applications, CoRR...
  • Z. Gong, W. Wang, B. Li, D. Song, W.-S. Ku, Adversarial texts with gradient methods, CoRR abs/1801.07175....
  • M. Alzantot et al.

    Generating natural language adversarial examples

  • M. Sato, J. Suzuki, H. Shindo, Y. Matsumoto, Interpretable adversarial perturbation in input embedding space for text,...
  • S. Ren et al.

    Generating natural language adversarial examples through probability weighted word saliency

  • D. Jin, Z. Jin, J. Tianyi Zhou, P. Szolovits, Is BERT Really Robust? A Strong Baseline for Natural Language Attack on...
  • Cited by (13)

    • Aliasing black box adversarial attack with joint self-attention distribution and confidence probability

      2023, Expert Systems with Applications
      Citation Excerpt :

      Moreover, it is difficult to measure the scope of changes in a specific text and obtain the specific attack size. Inspired by the recent success of the adversarial attack in the image domain, the gradient-based attacks, optimized-based attacks, and model-based attacks have been widely integrated into the text field (Cheng, Yi, & Chen, 2020; Kwon & Lee, 2022; Maimon & Rokach, 2022; Pan, Li, Zhang, Chen, & Lin, 2020; Qiu, Liu, & Zhou, 2022; Zhu, Gu, Qian, Lau, & Tian, 2022). Adversarial attacks can be generally divided into two types, the white-box attacks, and the black-box attacks.

    • Semantic-Preserving Adversarial Text Attacks

      2023, IEEE Transactions on Sustainable Computing
    View all citing articles on Scopus

    Bin Zhu received his bachelor degree in Software Engineering from Southwest Jiaotong University (2019). He is currently earning his master’s degree in the Cyberspace Institute of Advanced Technology (CIAT), Guangzhou University, China. His research includes Natural Language Processing and Adversarial Robustness.

    Zhaoquan Gu received his bachelor degree in Computer Science from Tsinghua University (2011) and PhD degree in Computer Science from Tsinghua University (2015). He is currently a Professor in the Cyberspace Institute of Advanced Technology (CIAT), Guangzhou University, China. His research includes wireless networks, distributed computing, and big data analysis.

    Yaguan Qian received the B.S. degree in computer science from Tianjin University, in 1999, and the M.S. and Ph.D. degrees in computer science from Zhejiang University, in 2005 and 2014, respectively. He is an Associate Professor with the School of Big Data Science, Zhejiang University of Science and Technology. In his research areas, he has published more than 30 articles in international journals or conferences. His research interests mainly include machine learning, big-data analysis, pattern recognition, and machine vision. He is a member of the China Computer Federation (CCF), the Chinese Association for Artificial Intelligence (CAAI), and The Chinese Association of Automation (CAA).

    Francis C.M. Lau received his PhD in computer science from the University of Waterloo in 1986. He has been a faculty member of the Department of Computer Science, The University of Hong Kong since 1987, where he served as the department chair from 2000 to 2005. He is now Associate Dean of Faculty of Engineering, the University of Hong Kong. He was a honorary chair professor in the Institute of Theoretical Computer Science of Tsinghua University from 2007 to 2010. His research interests include computer systems and networking, algorithms, HCI, and application of IT to arts. He is the editor-in-chief of the Journal of Interconnection Networks.

    Zhihong Tian (Member, IEEE) is currently a Professor and the Dean of the Cyberspace Institute of Advanced Technology, Guangzhou University, Guangdong Province, China. He is a Distinguished Professor with Guangdong Province Universities and Colleges Pearl River Scholar. He is also a part-time Professor with Carlton University, Ottawa, Canada. Previously, he served in different academic and administrative positions with the Harbin Institute of Technology. His research has been supported in part by the National Natural Science Foundation of China, National Key research and Development Plan of China, National High-tech Research and Development Program of China (863 Program), and National Basic Research Program of China (973 Program). He has authored over 200 journal and conference papers in these areas. His research interests include computer networks and cyberspace security. He also served as a member, a chair, and a general chair of a number of international conferences. He is a Senior Member of the China Computer Federation.

    View full text