In Situ Augmentation for Defending Against Adversarial Attacks on Text Classifiers

Xu, Lei; Berti-Equille, Laure; Cuesta-Infante, Alfredo; Veeramachaneni, Kalyan

doi:10.1007/978-3-031-30111-7_41

Lei Xu¹²,
Laure Berti-Equille¹³,
Alfredo Cuesta-Infante¹⁴ &
…
Kalyan Veeramachaneni¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13625))

Included in the following conference series:

International Conference on Neural Information Processing

866 Accesses

Abstract

In text classification, recent research shows that adversarial attack methods can generate sentences that dramatically decrease the classification accuracy of state-of-the-art neural text classifiers. However, very few defense methods have been proposed against these generated high-quality adversarial sentences. In this paper, we propose LMAg (Language-Model-based Augmentation using Gradient Guidance), an in situ data augmentation method as a defense mechanism effective in two representative defense setups. Specifically, LMAg transforms input text during the test time. It uses the norm of the gradient to estimate the importance of a word to the classifier’s prediction, then replaces those words with alternatives proposed by a masked language model. LMAg is an additional protection layer on the classifier that counteracts the perturbations made by adversarial attack methods, thus can protect the classifier from adversarial attack without additional training. Experimental results show that LMAg can improve after-attack accuracy of BERT text classifier by \(51.5\%\) and \(17.3\%\) for two setups respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We fail to attack Yelp and IMDB datasets with PSO because it is inefficient on long sentences.

References

Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018)
Google Scholar
Belinkov, Y., Bisk, Y.: Synthetic and natural noise both break neural machine translation. In: International Conference on Learning Representations (2018)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (2019)
Google Scholar
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2018)
Google Scholar
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (2020)
Google Scholar
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017)
Google Scholar
Jia, R., Raghunathan, A., Göksel, K., Liang, P.: Certified robustness to adversarial word substitutions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (2019)
Google Scholar
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? Natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020)
Google Scholar
Li, D., et al.: Contextualized perturbation for textual adversarial attack. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5053–5069 (2021)
Google Scholar
Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X.: BERT-attack: adversarial attack against BERT using BERT. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (2020)
Google Scholar
Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep text classification can be fooled. In: Proceedings of the International Joint Conferences on Artificial Intelligence (2017)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: Proceedings of the International Conference on Learning Representations (2019)
Google Scholar
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (2011)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations (2018)
Google Scholar
Morris, J., Lifland, E., Lanchantin, J., Ji, Y., Qi, Y.: Reevaluating adversarial examples in natural language. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings (2020)
Google Scholar
Morris, J., Lifland, E., Yoo, J.Y., Grigsby, J., Jin, D., Qi, Y.: TextAttack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations (2020)
Google Scholar
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2005)
Google Scholar
Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1085–1097 (2019)
Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)
Google Scholar
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. In: International Conference on Learning Representations (2018)
Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: Glue: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (2018)
Google Scholar
Wang, X., Jin, H., Yang, Y., He, K.: Natural language adversarial defense through synonym encoding. In: The Conference on Uncertainty in Artificial Intelligence (2021)
Google Scholar
Wang, Y., Bansal, M.: Robust machine comprehension models via adversarial training. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers) (2018)
Google Scholar
Xu, L., Veeramachaneni, K.: Attacking text classifiers via sentence rewriting sampler. arXiv preprint arXiv:2104.08453 (2021)
Zang, Y., et al.: Word-level textual adversarial attacking as combinatorial optimization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)
Google Scholar
Zeng, G., et al.: OpenAttack: an open-source textual adversarial attack toolkit. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (Demo) (2021)
Google Scholar
Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Trans. Intell. Syst. Technol. (TIST) 11, 1–41 (2020)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the Conference on Advances in Neural Information Processing Systems (2015)
Google Scholar

Download references

Acknowledgment

Alfredo Cuesta-Infante has been funded by the Spanish Government research project MICINN PID2021-128362OB-I00.

Author information

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, USA
Lei Xu & Kalyan Veeramachaneni
Institute of Research for Development (IRD), Marseille, France
Laure Berti-Equille
Universidad Rey Juan Carlos, Móstoles, Spain
Alfredo Cuesta-Infante

Authors

Lei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Laure Berti-Equille
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Cuesta-Infante
View author publications
You can also search for this author in PubMed Google Scholar
Kalyan Veeramachaneni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kalyan Veeramachaneni .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, L., Berti-Equille, L., Cuesta-Infante, A., Veeramachaneni, K. (2023). In Situ Augmentation for Defending Against Adversarial Attacks on Text Classifiers. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-031-30111-7_41
Published: 13 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30110-0
Online ISBN: 978-3-031-30111-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

In Situ Augmentation for Defending Against Adversarial Attacks on Text Classifiers