Searching for Textual Adversarial Examples with Learned Strategy

Guo, Xiangzhe; Su, Ruidan; Tu, Shikui; Xu, Lei

doi:10.1007/978-981-99-1639-9_32

Xiangzhe Guo¹⁰,
Ruidan Su¹⁰,
Shikui Tu¹⁰ &
…
Lei Xu¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1791))

Included in the following conference series:

International Conference on Neural Information Processing

769 Accesses

Abstract

Adversarial attacks can help to reveal the vulnerability of neural networks. In the text classification domain, synonym replacement is an effective way to generate adversarial examples. However, the number of replacement combinations grows exponentially with the text length, making the search difficult. In this work, we propose an attack method which combines a synonym selection network and search strategies of beam search and Monte Carlo tree search (MCTS). The synonym selection network learns the patterns of synonyms which have high attack effect. We combine the network with beam search to gain a broader view by multiple search paths, and with MCTS to gain a deeper view by the exploration feedback, so as to effectively avoid local optimum. We evaluate our method with four datasets in a challenging black box setting which requires no access to the victim model’s parameters. Experimental results show that our method can generate high-quality adversarial examples with higher attack success rate and fewer number of victim model queries, and further experiments show that our method has higher transferability on the victim models. The code and data can be obtained via https://github.com/CMACH508/SearchTextualAdversarialExamples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. In: Proceedings of EMNLP (2018)
Google Scholar
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of EMNLP (2015)
Google Scholar
Browne, C.B., et al.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games (2012)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL (2019)
Google Scholar
Dong, Z., Dong, Q.: Hownet and the Computation of Meaning. World Scientific (2006)
Google Scholar
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. In: Proceedings of ACL (2018)
Google Scholar
Eger, S., et al.: Text processing like humans do: visually attacking and shielding NLP systems. In: Proceedings of NAACL (2019)
Google Scholar
Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers (2018)
Google Scholar
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. In: Proceedings of EMNLP (2020)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Proceedings of ICLR (2015)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. (1997)
Google Scholar
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Proceedings of EMNLP (2017)
Google Scholar
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of AAAI (2020)
Google Scholar
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. In: ICLR Workshop (2017)
Google Scholar
Li, D., et al.: Contextualized perturbation for textual adversarial attack. In: Proceedings of NAACL (2021)
Google Scholar
Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X.: BERT-ATTACK: adversarial attack against BERT using BERT. In: Proceedings of EMNLP (2020)
Google Scholar
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of ACL (2011)
Google Scholar
Maheshwary, R., Maheshwary, S., Pudi, V.: Generating natural language attacks in a hard label black box setting (2021)
Google Scholar
Maheshwary, R., Maheshwary, S., Pudi, V.: A strong baseline for query efficient attacks in a black box setting (2021)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. In: Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, 23–26 February 1992 (1992)
Google Scholar
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of ACL (2005)
Google Scholar
Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition. In: Proceedings of ACL (2019)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)
Google Scholar
Reddy, R.: Speech understanding systems: summary of results of the five-year research effort. Carnegie Mellon University (1976)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of EMNLP (2019)
Google Scholar
Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of ACL (2019)
Google Scholar
Williams, A., Nangia, N., Bowman, S.: A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of NAACL (2018)
Google Scholar
Xiangzhe, G., Shikui, T., Lei, X.: Learning to generate textual adversarial examples. In: The 31st International Conference on Artificial Neural Networks (2022)
Google Scholar
Yoo, J.Y., Morris, J., Lifland, E., Qi, Y.: Searching for a search method: benchmarking search algorithms for generating NLP adversarial examples. In: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (2020)
Google Scholar
Zang, Y., et al.: Word-level textual adversarial attacking as combinatorial optimization. In: Proceedings of ACL (2020)
Google Scholar
Zhang, X., Zhao, J.J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of NeurIPS (2015)
Google Scholar

Download references

Acknowledgement

This work was supported by the National Key R &D Program of China (2018AAA0100700), and Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Xiangzhe Guo, Ruidan Su, Shikui Tu & Lei Xu

Authors

Xiangzhe Guo
View author publications
You can also search for this author in PubMed Google Scholar
Ruidan Su
View author publications
You can also search for this author in PubMed Google Scholar
Shikui Tu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shikui Tu .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, X., Su, R., Tu, S., Xu, L. (2023). Searching for Textual Adversarial Examples with Learned Strategy. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1791. Springer, Singapore. https://doi.org/10.1007/978-981-99-1639-9_32

Download citation

DOI: https://doi.org/10.1007/978-981-99-1639-9_32
Published: 15 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1638-2
Online ISBN: 978-981-99-1639-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Searching for Textual Adversarial Examples with Learned Strategy