Abstract
Deep learning models have achieved remarkable success across various domains, yet their susceptibility to adversarial attacks remains a pressing concern. While recent advancements in adversarial attacks have aimed to enhance model defenses, many existing techniques suffer from drawbacks such as higher perturbation rates, higher query count, reduced textual similarity, or lower success rates. This paper addresses this problematic by proposing a dynamic search strategy that leverages the concept of attackability to guide and optimise the generation of adversarial attacks. The method seeks to improve the quality of generated adversarial samples via minimizing perturbation rates, query count, and maintaining high success rates. Experimental results demonstrate its effectiveness compared to existing techniques, representing a significant advancement in the field of adversarial attack generation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Malik, V., Bhat, A., Modi, A.: ADV-OLM: generating textual adversaries via OLM. arXiv preprint arXiv:2101.08523 (2021)
Yoo, J. Y., Qi, Y.: Towards improving adversarial training of NLP models. arXiv preprint arXiv:2109.00544 (2021)
Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271 (2018)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: TextFool: fool your model with natural adversarial text (2019)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8018–8025 (2020)
Ivgi, M., Berant, J.: Achieving model robustness through discrete adversarial training. arXiv preprint arXiv:2104.05062 (2021)
Fu, X., Gu, Z., Han, W., Qian, Y., Wang, B.: Exploring security vulnerabilities of deep learning models by adversarial attacks. Wireless Communications and Mobile Computing (2021)
Yuan, L., Zhang, Y., Chen, Y., Wei, W.: Bridge the gap between CV and NLP! A gradient-based textual adversarial attack framework. arXiv preprint arXiv:2110.15317 (2021)
Yu, Z., Wang, X., Che, W., He, K.: TextHacker: learning based hybrid local search algorithm for text hard-label adversarial attack. arXiv preprint arXiv:2201.08193 (2022)
Ye, M., Miao, C., Wang, T., Ma, F.: TextHoaxer: budgeted hard-label adversarial attacks on text. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3877–3884 (2022)
Lee, D., Moon, S., Lee, J., Song, H.O.: Query-efficient and scalable black-box adversarial attacks on discrete sequential data via Bayesian optimization. In: International Conference on Machine Learning, pp. 12478–12497. PMLR (2022)
Wang, B., Xu, C., Liu, X., Cheng, Y., Li, B.: SemAttack: natural textual attacks via different semantic spaces. arXiv preprint arXiv:2205.01287 (2022)
Guo, C., Sablayrolles, A., Jégou, H., Kiela, D.: Gradient-based adversarial attacks against text transformers. arXiv preprint arXiv:2104.13733 (2021)
Yang, X., Liu, W., Bailey, J., Tao, D., Liu, W.: Bigram and unigram based text attack via adaptive monotonic heuristic search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 706–714 (2021)
Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1085–1097 (2019)
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970 (2020)
Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X.: BERT-attack: adversarial attack against BERT using BERT. arXiv preprint arXiv:2004.09984 (2020)
Li, D., et al.: Contextualized perturbation for textual adversarial attack. arXiv preprint arXiv:2009.07502 (2020)
Harbecke, D., Alt, C.: Considering likelihood in NLP classification explanations with occlusion and language modeling. arXiv preprint arXiv:2004.09890 (2020)
Shi, Z., Huang, M.: Robustness to modification with shared words in paraphrase identification. arXiv preprint arXiv:1909.02560 (2019)
Maheshwary, R., Maheshwary, S., Pudi,V.: A strong baseline for query efficient attacks in a black box setting. arXiv preprint arXiv:2109.04775 (2021)
Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.-J., Srivastava, M., Chang, K.-W.: Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018)
Maheshwary, R., Maheshwary, S., Pudi, V.: Generating natural language attacks in a hard label black box setting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13525–13533 (2021)
Jia, R., Raghunathan, A., Göksel, K., Liang, P.: Certified robustness to adversarial word substitutions. arXiv preprint arXiv:1909.00986 (2019)
Zang, Y., et al.: Word-level textual adversarial attacking as combinatorial optimization. arXiv preprint arXiv:1910.12196 (2019)
Yang, X., Liu, W., Tao, D., Liu, W.: BESA: BERT-based simulated annealing for adversarial text attacks. In: IJCAI, pp. 3293–3299 (2021)
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)
Liu, H., Yu, J., Li, S., Ma, J., Ji, B.: A context-aware approach for textual adversarial attack through probability difference guided beam search. arXiv preprint arXiv:2208.08029 (2022)
Yoo, J.Y., Morris, J.X., Lifland, E., Qi, Y.: Searching for a search method: benchmarking search algorithms for generating NLP adversarial examples. arXiv preprint arXiv:2009.06368 (2020)
Wang, X., Yang, Y., Deng, Y., He, K.: Adversarial training with fast gradient projection method against synonym substitution based text attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13997–14005 (2021)
Raina, V., Gales, M.: Identifying adversarially attackable and robust samples. arXiv preprint arXiv:2301.12896 (2023)
Raina, V., Gales, M.: Sample attackability in natural language adversarial attacks. arXiv preprint arXiv:2306.12043 (2023)
Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)
Morris, J.X., Lifland, E., Yoo, J.Y., Grigsby, J., Jin, D., Qi, Y.: TextAttack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. arXiv preprint arXiv:2005.05909 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Khemis, S., Yacine, A., Akrem, B.M. (2025). Exploiting Attackability for Effective Textual Adversarial Attacks. In: Bennour, A., Bouridane, A., Almaadeed, S., Bouaziz, B., Edirisinghe, E. (eds) Intelligent Systems and Pattern Recognition. ISPR 2024. Communications in Computer and Information Science, vol 2303. Springer, Cham. https://doi.org/10.1007/978-3-031-82150-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-82150-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-82149-3
Online ISBN: 978-3-031-82150-9
eBook Packages: Computer ScienceComputer Science (R0)