Skip to main content

Exploiting Attackability for Effective Textual Adversarial Attacks

  • Conference paper
  • First Online:
Intelligent Systems and Pattern Recognition (ISPR 2024)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2303))

  • 115 Accesses

Abstract

Deep learning models have achieved remarkable success across various domains, yet their susceptibility to adversarial attacks remains a pressing concern. While recent advancements in adversarial attacks have aimed to enhance model defenses, many existing techniques suffer from drawbacks such as higher perturbation rates, higher query count, reduced textual similarity, or lower success rates. This paper addresses this problematic by proposing a dynamic search strategy that leverages the concept of attackability to guide and optimise the generation of adversarial attacks. The method seeks to improve the quality of generated adversarial samples via minimizing perturbation rates, query count, and maintaining high success rates. Experimental results demonstrate its effectiveness compared to existing techniques, representing a significant advancement in the field of adversarial attack generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Malik, V., Bhat, A., Modi, A.: ADV-OLM: generating textual adversaries via OLM. arXiv preprint arXiv:2101.08523 (2021)

  2. Yoo, J. Y., Qi, Y.: Towards improving adversarial training of NLP models. arXiv preprint arXiv:2109.00544 (2021)

  3. Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271 (2018)

  4. Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: TextFool: fool your model with natural adversarial text (2019)

    Google Scholar 

  5. Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8018–8025 (2020)

    Google Scholar 

  6. Ivgi, M., Berant, J.: Achieving model robustness through discrete adversarial training. arXiv preprint arXiv:2104.05062 (2021)

  7. Fu, X., Gu, Z., Han, W., Qian, Y., Wang, B.: Exploring security vulnerabilities of deep learning models by adversarial attacks. Wireless Communications and Mobile Computing (2021)

    Google Scholar 

  8. Yuan, L., Zhang, Y., Chen, Y., Wei, W.: Bridge the gap between CV and NLP! A gradient-based textual adversarial attack framework. arXiv preprint arXiv:2110.15317 (2021)

  9. Yu, Z., Wang, X., Che, W., He, K.: TextHacker: learning based hybrid local search algorithm for text hard-label adversarial attack. arXiv preprint arXiv:2201.08193 (2022)

  10. Ye, M., Miao, C., Wang, T., Ma, F.: TextHoaxer: budgeted hard-label adversarial attacks on text. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3877–3884 (2022)

    Google Scholar 

  11. Lee, D., Moon, S., Lee, J., Song, H.O.: Query-efficient and scalable black-box adversarial attacks on discrete sequential data via Bayesian optimization. In: International Conference on Machine Learning, pp. 12478–12497. PMLR (2022)

    Google Scholar 

  12. Wang, B., Xu, C., Liu, X., Cheng, Y., Li, B.: SemAttack: natural textual attacks via different semantic spaces. arXiv preprint arXiv:2205.01287 (2022)

  13. Guo, C., Sablayrolles, A., Jégou, H., Kiela, D.: Gradient-based adversarial attacks against text transformers. arXiv preprint arXiv:2104.13733 (2021)

  14. Yang, X., Liu, W., Bailey, J., Tao, D., Liu, W.: Bigram and unigram based text attack via adaptive monotonic heuristic search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 706–714 (2021)

    Google Scholar 

  15. Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1085–1097 (2019)

    Google Scholar 

  16. Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970 (2020)

  17. Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X.: BERT-attack: adversarial attack against BERT using BERT. arXiv preprint arXiv:2004.09984 (2020)

  18. Li, D., et al.: Contextualized perturbation for textual adversarial attack. arXiv preprint arXiv:2009.07502 (2020)

  19. Harbecke, D., Alt, C.: Considering likelihood in NLP classification explanations with occlusion and language modeling. arXiv preprint arXiv:2004.09890 (2020)

  20. Shi, Z., Huang, M.: Robustness to modification with shared words in paraphrase identification. arXiv preprint arXiv:1909.02560 (2019)

  21. Maheshwary, R., Maheshwary, S., Pudi,V.: A strong baseline for query efficient attacks in a black box setting. arXiv preprint arXiv:2109.04775 (2021)

  22. Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.-J., Srivastava, M., Chang, K.-W.: Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018)

  23. Maheshwary, R., Maheshwary, S., Pudi, V.: Generating natural language attacks in a hard label black box setting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13525–13533 (2021)

    Google Scholar 

  24. Jia, R., Raghunathan, A., Göksel, K., Liang, P.: Certified robustness to adversarial word substitutions. arXiv preprint arXiv:1909.00986 (2019)

  25. Zang, Y., et al.: Word-level textual adversarial attacking as combinatorial optimization. arXiv preprint arXiv:1910.12196 (2019)

  26. Yang, X., Liu, W., Tao, D., Liu, W.: BESA: BERT-based simulated annealing for adversarial text attacks. In: IJCAI, pp. 3293–3299 (2021)

    Google Scholar 

  27. Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)

  28. Liu, H., Yu, J., Li, S., Ma, J., Ji, B.: A context-aware approach for textual adversarial attack through probability difference guided beam search. arXiv preprint arXiv:2208.08029 (2022)

  29. Yoo, J.Y., Morris, J.X., Lifland, E., Qi, Y.: Searching for a search method: benchmarking search algorithms for generating NLP adversarial examples. arXiv preprint arXiv:2009.06368 (2020)

  30. Wang, X., Yang, Y., Deng, Y., He, K.: Adversarial training with fast gradient projection method against synonym substitution based text attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13997–14005 (2021)

    Google Scholar 

  31. Raina, V., Gales, M.: Identifying adversarially attackable and robust samples. arXiv preprint arXiv:2301.12896 (2023)

  32. Raina, V., Gales, M.: Sample attackability in natural language adversarial attacks. arXiv preprint arXiv:2306.12043 (2023)

  33. Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)

  34. Morris, J.X., Lifland, E., Yoo, J.Y., Grigsby, J., Jin, D., Qi, Y.: TextAttack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. arXiv preprint arXiv:2005.05909 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salim Khemis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khemis, S., Yacine, A., Akrem, B.M. (2025). Exploiting Attackability for Effective Textual Adversarial Attacks. In: Bennour, A., Bouridane, A., Almaadeed, S., Bouaziz, B., Edirisinghe, E. (eds) Intelligent Systems and Pattern Recognition. ISPR 2024. Communications in Computer and Information Science, vol 2303. Springer, Cham. https://doi.org/10.1007/978-3-031-82150-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-82150-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-82149-3

  • Online ISBN: 978-3-031-82150-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics