Exploiting Attackability for Effective Textual Adversarial Attacks

Khemis, Salim; Yacine, Amara; Akrem, Benatia Mohamed

doi:10.1007/978-3-031-82150-9_8

Salim Khemis⁹,
Amara Yacine⁹ &
Benatia Mohamed Akrem⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2303))

Included in the following conference series:

International Conference on Intelligent Systems and Pattern Recognition

115 Accesses

Abstract

Deep learning models have achieved remarkable success across various domains, yet their susceptibility to adversarial attacks remains a pressing concern. While recent advancements in adversarial attacks have aimed to enhance model defenses, many existing techniques suffer from drawbacks such as higher perturbation rates, higher query count, reduced textual similarity, or lower success rates. This paper addresses this problematic by proposing a dynamic search strategy that leverages the concept of attackability to guide and optimise the generation of adversarial attacks. The method seeks to improve the quality of generated adversarial samples via minimizing perturbation rates, query count, and maintaining high success rates. Experimental results demonstrate its effectiveness compared to existing techniques, representing a significant advancement in the field of adversarial attack generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Malik, V., Bhat, A., Modi, A.: ADV-OLM: generating textual adversaries via OLM. arXiv preprint arXiv:2101.08523 (2021)
Yoo, J. Y., Qi, Y.: Towards improving adversarial training of NLP models. arXiv preprint arXiv:2109.00544 (2021)
Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271 (2018)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: TextFool: fool your model with natural adversarial text (2019)
Google Scholar
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8018–8025 (2020)
Google Scholar
Ivgi, M., Berant, J.: Achieving model robustness through discrete adversarial training. arXiv preprint arXiv:2104.05062 (2021)
Fu, X., Gu, Z., Han, W., Qian, Y., Wang, B.: Exploring security vulnerabilities of deep learning models by adversarial attacks. Wireless Communications and Mobile Computing (2021)
Google Scholar
Yuan, L., Zhang, Y., Chen, Y., Wei, W.: Bridge the gap between CV and NLP! A gradient-based textual adversarial attack framework. arXiv preprint arXiv:2110.15317 (2021)
Yu, Z., Wang, X., Che, W., He, K.: TextHacker: learning based hybrid local search algorithm for text hard-label adversarial attack. arXiv preprint arXiv:2201.08193 (2022)
Ye, M., Miao, C., Wang, T., Ma, F.: TextHoaxer: budgeted hard-label adversarial attacks on text. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3877–3884 (2022)
Google Scholar
Lee, D., Moon, S., Lee, J., Song, H.O.: Query-efficient and scalable black-box adversarial attacks on discrete sequential data via Bayesian optimization. In: International Conference on Machine Learning, pp. 12478–12497. PMLR (2022)
Google Scholar
Wang, B., Xu, C., Liu, X., Cheng, Y., Li, B.: SemAttack: natural textual attacks via different semantic spaces. arXiv preprint arXiv:2205.01287 (2022)
Guo, C., Sablayrolles, A., Jégou, H., Kiela, D.: Gradient-based adversarial attacks against text transformers. arXiv preprint arXiv:2104.13733 (2021)
Yang, X., Liu, W., Bailey, J., Tao, D., Liu, W.: Bigram and unigram based text attack via adaptive monotonic heuristic search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 706–714 (2021)
Google Scholar
Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1085–1097 (2019)
Google Scholar
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970 (2020)
Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X.: BERT-attack: adversarial attack against BERT using BERT. arXiv preprint arXiv:2004.09984 (2020)
Li, D., et al.: Contextualized perturbation for textual adversarial attack. arXiv preprint arXiv:2009.07502 (2020)
Harbecke, D., Alt, C.: Considering likelihood in NLP classification explanations with occlusion and language modeling. arXiv preprint arXiv:2004.09890 (2020)
Shi, Z., Huang, M.: Robustness to modification with shared words in paraphrase identification. arXiv preprint arXiv:1909.02560 (2019)
Maheshwary, R., Maheshwary, S., Pudi,V.: A strong baseline for query efficient attacks in a black box setting. arXiv preprint arXiv:2109.04775 (2021)
Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.-J., Srivastava, M., Chang, K.-W.: Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018)
Maheshwary, R., Maheshwary, S., Pudi, V.: Generating natural language attacks in a hard label black box setting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13525–13533 (2021)
Google Scholar
Jia, R., Raghunathan, A., Göksel, K., Liang, P.: Certified robustness to adversarial word substitutions. arXiv preprint arXiv:1909.00986 (2019)
Zang, Y., et al.: Word-level textual adversarial attacking as combinatorial optimization. arXiv preprint arXiv:1910.12196 (2019)
Yang, X., Liu, W., Tao, D., Liu, W.: BESA: BERT-based simulated annealing for adversarial text attacks. In: IJCAI, pp. 3293–3299 (2021)
Google Scholar
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)
Liu, H., Yu, J., Li, S., Ma, J., Ji, B.: A context-aware approach for textual adversarial attack through probability difference guided beam search. arXiv preprint arXiv:2208.08029 (2022)
Yoo, J.Y., Morris, J.X., Lifland, E., Qi, Y.: Searching for a search method: benchmarking search algorithms for generating NLP adversarial examples. arXiv preprint arXiv:2009.06368 (2020)
Wang, X., Yang, Y., Deng, Y., He, K.: Adversarial training with fast gradient projection method against synonym substitution based text attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13997–14005 (2021)
Google Scholar
Raina, V., Gales, M.: Identifying adversarially attackable and robust samples. arXiv preprint arXiv:2301.12896 (2023)
Raina, V., Gales, M.: Sample attackability in natural language adversarial attacks. arXiv preprint arXiv:2306.12043 (2023)
Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)
Morris, J.X., Lifland, E., Yoo, J.Y., Grigsby, J., Jin, D., Qi, Y.: TextAttack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. arXiv preprint arXiv:2005.05909 (2020)

Download references

Author information

Authors and Affiliations

Ecole Militaire Polytechnique (EMP), BP 17, 16046, Bordj el Bahri Algiers, Algeria
Salim Khemis, Amara Yacine & Benatia Mohamed Akrem

Authors

Salim Khemis
View author publications
You can also search for this author in PubMed Google Scholar
Amara Yacine
View author publications
You can also search for this author in PubMed Google Scholar
Benatia Mohamed Akrem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salim Khemis .

Editor information

Editors and Affiliations

Tebessa University, Tebessa, Algeria
Akram Bennour
Sharjah University, Sharjah, United Arab Emirates
Ahmed Bouridane
Qatar University, Doha, Qatar
Somaya Almaadeed
Sfax University, Sfax, Tunisia
Bassem Bouaziz
Keele University, Newcastle, UK
Eran Edirisinghe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khemis, S., Yacine, A., Akrem, B.M. (2025). Exploiting Attackability for Effective Textual Adversarial Attacks. In: Bennour, A., Bouridane, A., Almaadeed, S., Bouaziz, B., Edirisinghe, E. (eds) Intelligent Systems and Pattern Recognition. ISPR 2024. Communications in Computer and Information Science, vol 2303. Springer, Cham. https://doi.org/10.1007/978-3-031-82150-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-82150-9_8
Published: 02 March 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-82149-3
Online ISBN: 978-3-031-82150-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploiting Attackability for Effective Textual Adversarial Attacks