skip to main content
10.1145/3488932.3517397acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article

Generating Content-Preserving and Semantics-Flipping Adversarial Text

Published: 30 May 2022 Publication History

Abstract

Natural Language Processing (NLP) models are often vulnerable to semantics-preserving adversarial attacks. That is, they make different semantic predictions on input instances with similar content and semantics. However, it remains unclear to which extent modern NLP models are vulnerable to content-preserving and semantics-flipping (CPSF) adversarial attacks. That is, they would make the same semantic prediction on input instances with similar content but flipped semantics. Attackers can use either semantics-preserving or CPSF adversarial examples to create misunderstanding between humans and models, and incur severe consequences in real-world applications. However, this equally important problem on CPSF adversarial examples has not been studied by researchers yet. In this paper, we perform the first study to investigate CPSF adversarial examples and propose CPSF adversarial attacks to reveal this new type of vulnerability of NLP models. We develop a two-stage approach to generate CPSF adversarial examples. Our experiments on two types of NLP tasks, sentiment analysis and textual entailment, demonstrate that CPSF adversarial examples can successfully fool victim models while preserving the same content with flipped semantics to humans. We further validate the good transferability of CPSF adversarial examples on NLP services of Microsoft and Google. Moreover, we demonstrate that adversarial training can to a meaningful extent mitigate CPSF adversarial attacks. Overall, our work implies that researchers need to improve NLP models' robustness against CPSF adversarial attacks that uniquely exploit the blind spots where NLP models are too insensitive to even big changes in semantics.

Supplementary Material

MP4 File (ASIA-CCS22-fp338.mp4)
The presentation video of the paper "Generating Content-Preserving and Semantics-Flipping Adversarial Text".

References

[1]
Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating Natural Language Adversarial Examples. Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2890--2896.
[2]
Samuel Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 632--642.
[3]
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Céspedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018).
[4]
Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for Natural Language Inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.
[6]
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White-Box Adversarial Examples for Text Classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 31--36.
[7]
Max Glockner, Vered Shwartz, and Yoav Goldberg. 2018. Breaking NLI Systems with Sentences that Require Simple Lexical Inferences. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 650--655.
[8]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations.
[9]
Olaronke G Iroju and Janet O Olaleke. 2015. A systematic review of natural language processing in healthcare. International Journal of Information Technology and Computer Science, Vol. 8 (2015), 44--50.
[10]
Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zettlemoyer. 2018. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1875--1885.
[11]
Robin Jia and Percy Liang. 2017. Adversarial Examples for Evaluating Reading Comprehension Systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2021--2031.
[12]
Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 8018--8025.
[13]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 427--431.
[14]
Dongyeop Kang, Tushar Khot, Ashish Sabharwal, and Eduard Hovy. 2018. AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2418--2428.
[15]
Jinfeng Li, Tianyu Du, Shouling Ji, Rong Zhang, Quan Lu, Min Yang, and Ting Wang. 2020. TextShield: Robust Text Classification Based on Multimodal Embedding and Neural Machine Translation. In Proceedings of the USENIX Security Symposium.
[16]
Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. In Proceedings of the Annual Network and Distributed System Security Symposium (NDSS).
[17]
Juncen Li, Robin Jia, He He, and Percy Liang. 2018. Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1865--1874.
[18]
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM, Vol. 38, 11 (1995), 39--41.
[19]
Takeru Miyato, Andrew M. Dai, and Ian J. Goodfellow. 2017. Adversarial Training Methods for Semi-Supervised Text Classification. In International Conference on Learning Representations.
[20]
Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, and Douwe Kiela. 2020. Adversarial NLI: A New Benchmark for Natural Language Understanding. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 4885--4901.
[21]
Tong Niu and Mohit Bansal. 2018. Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models. In Proceedings of the Conference on Computational Natural Language Learning. 486--496.
[22]
Misleading Authorship Attribution of Source Code using Adversarial Learning. 2019. Erwin Quiring and Alwin Maier and Konrad Rieck. In Proceedings of the USENIX Security Symposium.
[23]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical Black-Box Attacks against Machine Learning. In Proceedings of the ACM on Asia Conference on Computer and Communications Security.
[24]
Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2227--2237.
[25]
Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, and Alan W Black. 2018. Style Transfer Through Back-Translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 866--876.
[26]
Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che. 2019. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 1085--1097.
[27]
Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2017. Style transfer from non-parallel text by cross-alignment. In Advances in neural information processing systems. 6830--6841.
[28]
Congzheng Song, Alexander Rush, and Vitaly Shmatikov. 2020. Adversarial Semantic Collisions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 4198--4210.
[29]
Fnu Suya, Jianfeng Chi, David Evans, and Yuan Tian. 2020. Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries. In Proceedings of the USENIX Security Symposium.
[30]
Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing Machine Learning Models via Prediction APIs. In Proceedings of the USENIX Security Symposium.
[31]
Yi-Ting Alicia Tsai, Min-Chu Yang, and Han-Yu Chen. 2019. Adversarial Attack on Sentiment Classification. In Proceedings of the Workshop on Widening NLP. 166--173.
[32]
Haohan Wang, Da Sun, and Eric P. Xing. 2019. What if We Simply Swap the Two Text Fragments? A Straightforward yet Effective Way to Test the Robustness of Methods to Confounding Signals in Nature Language Inference Tasks. In Proceedings of the AAAI conference on artificial intelligence. 7136--7143.
[33]
Xingyou Wang, Weijie Jiang, and Zhiyong Luo. 2016. Combination of convolutional and recurrent neural network for sentiment analysis of short texts. In Proceedings of the international conference on computational linguistics: Technical papers. 2428--2437.
[34]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. CoRR (2016).
[35]
Rui Xia, Feng Xu, Chengqing Zong, Qianmu Li, Yong Qi, and Tao Li. 2015. Dual sentiment analysis: Considering two sides of one review. IEEE transactions on knowledge and data engineering, Vol. 27 (2015), 2120--2133.
[36]
Weilin Xu, Yanjun Qi, and David Evans. 2016. Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers. In Proceedings of the Annual Network and Distributed System Security Symposium (NDSS).
[37]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems. 649--657.
[38]
Jake Zhao, Yoon Kim, Kelly Zhang, Alexander M Rush, and Yann LeCun. 2018b. Adversarially regularized autoencoders. In International Conference on Machine Learning. 9405--9420.
[39]
Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018a. Generating Natural Adversarial Examples. In International Conference on Learning Representations.

Cited By

View all
  • (2022)Towards Adversarial Attacks for Clinical Document ClassificationElectronics10.3390/electronics1201012912:1(129)Online publication date: 28-Dec-2022

Index Terms

  1. Generating Content-Preserving and Semantics-Flipping Adversarial Text

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASIA CCS '22: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security
      May 2022
      1291 pages
      ISBN:9781450391405
      DOI:10.1145/3488932
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 May 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. adversarial examples
      2. sentiment analysis
      3. textual entailment

      Qualifiers

      • Research-article

      Funding Sources

      • National Science Foundation

      Conference

      ASIA CCS '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 418 of 2,322 submissions, 18%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)54
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Towards Adversarial Attacks for Clinical Document ClassificationElectronics10.3390/electronics1201012912:1(129)Online publication date: 28-Dec-2022

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media