Skip to main content
Log in

HOMOGRAPH: a novel textual adversarial attack architecture to unmask the susceptibility of linguistic acceptability classifiers

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

The increasing adoption of deep learning algorithms for automating downstream natural language processing (NLP) tasks has created a need to enhance their capability to assess linguistic acceptability. The CoLA corpus was created to aid in the development of models that can accurately assess grammatical acceptability and evaluate linguistic proficiency. Transformer models, widely utilized in various natural language processing tasks, including the evaluation of linguistic acceptability, may possess limitations that undermine their perceived robustness. These models exhibit susceptibility to adversarial text attacks, which are characterized by inconspicuous modifications made to the original input text. The tactfully chosen modifications are such that the adversarial examples generated, although correctly classified by human observers, successfully mislead the targeted model of the attack, consequently hindering its reliability. This paper presents a novel framework called ‘Homograph’ to generate adversarial text in a black-box setting. The efficacy of the suggested attack in undermining models designed for linguistic acceptability is significantly enhanced by its capability to generate visually similar adversarial examples that do not compromise the grammatical acceptability of the original input samples. These examples effectively deceive the model, causing it to modify its predicted label. In the context of the linguistic acceptability task, our attack was effectively applied to five transformer models: ALBERT, BERT, DistilBERT, RoBERTa, and XL-Net, fine-tuned on the CoLA dataset. Our work distinguishes itself from existing text-based attacks through several contributions. Firstly, we surpass previous baselines in terms of attack success rate (\(ASR\)) and average perturbation rate (\(APR\)) for models trained on the CoLA dataset. Secondly, we generate more potent adversarial examples that contain imperceptible modifications, thereby preserving the original label. Lastly, we employ a straightforward character-level transformation technique to produce adversarial examples that closely resemble the original text.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The data used to support the conclusions of this study are available upon request from the corresponding author.

References

  1. Sarker, I.H.: Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput. Sci. 2(6), 420 (2021). https://doi.org/10.1007/s42979-021-00815-1

    Article  Google Scholar 

  2. Aggarwal, S., Bhola, G., Vishwakarma, D.K.: Weighted voting ensemble of hybrid CNN-LSTM Models for vision-based human activity recognition. Multimed. Tools Appl. (2024). https://doi.org/10.1007/s11042-024-19582-1

    Article  Google Scholar 

  3. Aggarwal, S., Pandey, A., and Vishwakarma, K. D.: ‘Multimodal sarcasm recognition by fusing textual, visual and acoustic content via multi-headed attention for video dataset’, in 2023 world conference on communication & computing (WCONF), pp. 1–5. (2023). https://doi.org/10.1109/WCONF58270.2023.10235179.

  4. Goodfellow, I. J., Shlens, J., and Szegedy, C.: Explaining and harnessing adversarial examples’, arXiv: arXiv:1412.6572. (2015). https://doi.org/10.48550/arXiv.1412.6572.

  5. Moosavi-Dezfooli, S.-M., Fawzi, A. and Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks’, In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA: IEEE, pp. 2574–2582. (2016). https://doi.org/10.1109/CVPR.2016.282.

  6. Modas, A., Moosavi-Dezfooli, S.-M., and Frossard, P., SparseFool: a few pixels make a big difference, In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 9079–9088. (2019). https://doi.org/10.1109/CVPR.2019.00930.

  7. Aggarwal, S., Vishwakarma, D.K.: Exposing the achilles’ heel of textual hate speech classifiers using indistinguishable adversarial examples. Expert Syst. Appl. 254, 124278 (2024). https://doi.org/10.1016/j.eswa.2024.124278

    Article  Google Scholar 

  8. Peng, H., Wang, Z., Wei, C., Zhao, D., Guangquan, X., Han, J., Guo, S., Zhong, M., Ji, S.: TextJuggler: fooling text classification tasks by generating high-quality adversarial examples. Knowledge-Based Syst. 300, 112188 (2024). https://doi.org/10.1016/j.knosys.2024.112188

    Article  Google Scholar 

  9. Warstadt, A., Singh, A. and Bowman, S. R.: Neural network acceptability judgments, arXiv: arXiv:1805.12471 (2019). https://doi.org/10.48550/arXiv.1805.12471.

  10. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S.: ‘GLUE: A multi-task benchmark and analysis platform for natural language understanding’, In proceedings of the 2018 EMNLP workshop blackboxnlp: analyzing and interpreting neural networks for NLP, T. Linzen, G. Chrupała, and A. Alishahi, Eds., Brussels, Belgium: association for computational linguistics, pp. 353–355 (2018). https://doi.org/10.18653/v1/W18-5446.

  11. Zhu, Q.: On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recognit. Lett. 136, 71–80 (2020). https://doi.org/10.1016/j.patrec.2020.03.030

    Article  Google Scholar 

  12. Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding’, in proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota: Association for Computational Linguistics, pp. 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423.

  13. Sanh, V., Debut, L., Chaumond, J., and Wolf,T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, Feb. 29, 2020, arXiv: arXiv:1910.01108. Accessed: Jun. 06, 2023. [Online]. Available: http://arxiv.org/abs/1910.01108

  14. Islam, S., et al.: A comprehensive survey on applications of transformers for deep learning tasks. Expert Syst. Appl. 241, 122666 (2024). https://doi.org/10.1016/j.eswa.2023.122666

    Article  Google Scholar 

  15. Aggarwal, S., and Vishwakarma, D. K.: Protecting our children from the dark corners of youtube: a cutting-edge analysis’, In: 2023 4th IEEE global conference for advancement in technology (GCAT), pp. 1–5 (2023). https://doi.org/10.1109/GCAT59970.2023.10353306.

  16. Habbat, N., Nouri, H., Anoun, H., Hassouni, L.: Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning. Eng. Appl. Artif. Intell. 126, 106999 (2023). https://doi.org/10.1016/j.engappai.2023.106999

    Article  Google Scholar 

  17. Liu, T., Ke, Y., Wang, L., Zhang, X., Zhou, H., Xiaofei, W.: Clickbait detection on WeChat: a deep model integrating semantic and syntactic information. Knowl.-Based Syst. 245, 108605 (2022). https://doi.org/10.1016/j.knosys.2022.108605

    Article  Google Scholar 

  18. Almerekhi, H., Kwak, H., Salminen, J., Jansen, B.J.: PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits. Data Inf. Manag. 6(4), 100019 (2022). https://doi.org/10.1016/j.dim.2022.100019

    Article  Google Scholar 

  19. Formento, B., Foo, C. S., Tuan, L. A., and Ng, S. K.: Using punctuation as an adversarial attack on deep learning-based NLP Systems: an empirical study, In: findings of the association for computational linguistics: EACL 2023, Dubrovnik, Croatia: Association for computational linguistics, pp. 1–34. (2023). Accessed: Sep. 08, 2023. [Online]. Available: https://aclanthology.org/2023.findings-eacl.1

  20. Bajaj, A., Vishwakarma, D.K.: Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms. Int. J. Inf. Secur. (2024). https://doi.org/10.1007/s10207-024-00861-9

    Article  Google Scholar 

  21. Li,J., Ji, S. Du, T., Li, B., and Wang,T.: TextBugger: generating adversarial text against real-world applications’, In proceedings 2019 network and distributed system security symposium, San Diego, CA: Internet society, (2019). https://doi.org/10.14722/ndss.2019.23138.

  22. Liu, J., et al.: Aliasing black box adversarial attack with joint self-attention distribution and confidence probability. Expert Syst. Appl. 214, 119110 (2023). https://doi.org/10.1016/j.eswa.2022.119110

    Article  Google Scholar 

  23. Morris, J., Lifland, E.,Yoo, J. Y., Grigsby, J., Jin, D. and Qi, Y.: TextAttack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP’, In: proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, Online: association for computational linguistics, pp. 119–126. (2020). https://doi.org/10.18653/v1/2020.emnlp-demos.16.

  24. Zang Y., et al.: Word-level textual adversarial attacking as combinatorial optimization’, in proceedings of the 58th annual meeting of the association for computational linguistics, online: association for computational linguistics, pp. 6066–6080. (2020). https://doi.org/10.18653/v1/2020.acl-main.540.

  25. Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.-J., Srivastava, M., and Chang, K.-W.: Generating natural language adversarial examples, arXiv: arXiv:1804.07998 (2018). https://doi.org/10.48550/arXiv.1804.07998.

  26. Jia, R., Raghunathan, A., Göksel, K., and Liang, P.: Certified robustness to adversarial word substitutions’, Sep. 03, 2019, arXiv: arXiv:1909.00986. https://doi.org/10.48550/arXiv.1909.00986.

  27. Wang, X., Jin, H., Yang, Y., and He, K.: Natural language adversarial defense through synonym encoding’, Jun. 14, 2021, arXiv: arXiv:1909.06723. https://doi.org/10.48550/arXiv.1909.06723.

  28. Yoo, J. Y., and Qi, Y.: Towards improving adversarial training of NLP Models’, In: findings of the association for computational linguistics: EMNLP 2021, Punta Cana, dominican republic: association for computational linguistics, pp. 945–956. (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.81.

  29. Garg S., and Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification’, In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), ONLINE: association for computational linguistics, pp. 6174–6181. (2020). https://doi.org/10.18653/v1/2020.emnlp-main.498.

  30. Ribeiro, M. T., Wu, T., Guestrin, C., and Singh, S.: Beyond accuracy: behavioral testing of NLP models with CheckList’, In: Proceedings of the 58th annual meeting of the association for computational linguistics, online: association for computational linguistics, pp. 4902–4912. (2020). https://doi.org/10.18653/v1/2020.acl-main.442.

  31. Gao, J., Lanchantin, J., Soffa, M. L., and Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers’, In 2018 IEEE security and privacy workshops (SPW), pp. 50–56 (2018). https://doi.org/10.1109/SPW.2018.00016.

  32. Ebrahimi, J., Rao, A., Lowd, D., and Dou, D.: HotFlip: white-box adversarial examples for text classification’, In: proceedings of the 56th annual meeting of the association for computational linguistics (Volume 2: Short Papers), Melbourne, Australia: association for computational linguistics, pp. 31–36. (2018). https://doi.org/10.18653/v1/P18-2006.

  33. Kuleshov, V. Thakoor, S., Lau, T., and Ermon, S.: Adversarial examples for natural language classification problems’, Feb. 2018, Accessed: Jul. 24, 2024. [Online]. Available: https://openreview.net/forum?id=r1QZ3zbAZ

  34. Ren, S., Deng, Y., He, K., and Che, W.: Generating natural language adversarial examples through probability weighted word saliency’, In: proceedings of the 57th annual meeting of the association for computational linguistics, Florence, Italy: association for Computational Linguistics, pp. 1085–1097 (2019). https://doi.org/10.18653/v1/P19-1103.

  35. Jin, D., Jin, Z., Zhou, V., and Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment, Proceedings of the AAAI conference on artificial intelligence vol. 34, no. 05, Art. no. 05, (2020), https://doi.org/10.1609/aaai.v34i05.6311

  36. Pruthi, D., Dhingra, B., and Lipton, Z. C.: Combating adversarial misspellings with robust word recognition’, In Proceedings of the 57th annual meeting of the association for computational linguistics, florence, italy: association for computational linguistics, pp. 5582–5591 (2019). https://doi.org/10.18653/v1/P19-1561.

  37. Yang, X., Qi, Y., Chen, H., Liu, B., Liu, W.: Generation-based parallel particle swarm optimization for adversarial text attacks. Inf. Sci. 644, 119237 (2023). https://doi.org/10.1016/j.ins.2023.119237

    Article  Google Scholar 

  38. Dong Z., and Dong, Q.: HowNet - a hybrid language and knowledge resource, In: International conference on natural language processing and knowledge engineering, proceedings. pp. 820–824 (2003). https://doi.org/10.1109/NLPKE.2003.1276017.

  39. Xu, J., Du, Q.: TextTricker: loss-based and gradient-based adversarial attacks on text classification models. Eng. Appl. Artif. Intell. 92, 103641 (2020). https://doi.org/10.1016/j.engappai.2020.103641

    Article  Google Scholar 

  40. Chang, G., Gao, H., Yao, Z., Xiong, H.: TextGuise: adaptive adversarial example attacks on text classification model. Neurocomputing 529, 190–203 (2023). https://doi.org/10.1016/j.neucom.2023.01.071

    Article  Google Scholar 

  41. Charikar, M. S.: ‘Similarity estimation techniques from rounding algorithms’, In: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, in STOC ’02. New York, NY, USA: Association for Computing Machinery, pp. 380–388. (2002). https://doi.org/10.1145/509907.509965.

  42. Liu, Z., et al.: HyGloadAttack: hard-label black-box textual adversarial attacks via hybrid optimization. Neural Netw. 178, 106461 (2024). https://doi.org/10.1016/j.neunet.2024.106461

    Article  Google Scholar 

  43. Han, X., et al.: BFS2Adv: black-box adversarial attack towards hard-to-attack short texts. Comput. Secur. 141, 103817 (2024). https://doi.org/10.1016/j.cose.2024.103817

    Article  Google Scholar 

  44. Chiang, C.-H., and Lee, H.: Are synonym substitution attacks really synonym substitution attacks?’, In: findings of the association for computational linguistics: ACL 2023, Toronto, Canada: association for computational linguistics, pp. 1853–1878. (2023). https://doi.org/10.18653/v1/2023.findings-acl.117.

  45. Kennedy, J., and Eberhart, R.: Particle swarm optimization’, in Proceedings of ICNN’95 - International Conference on Neural Networks, vol. 4 pp. 1942–1948 (1995). https://doi.org/10.1109/ICNN.1995.488968.

  46. Yoo, J. Y., Morris, J., Lifland, E., and Qi, Y.: Searching for a search method: benchmarking search algorithms for generating nlp adversarial examples, in Proceedings of the third BlackboxNLP workshop on analyzing and interpreting neural networks for NLP, Online: association for computational linguistics, pp. 323–332. (2020). https://doi.org/10.18653/v1/2020.blackboxnlp-1.30.

  47. Cer D., et al.: Universal sentence encoder for english’, In: proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, Brussels, Belgium: association for computational linguistics, pp. 169–174. (2018). https://doi.org/10.18653/v1/D18-2029.

  48. Trotta, D., Guarasci, R., Leonardelli, E., and Tonelli, S., Monolingual and cross-lingual acceptability judgments with the Italian CoLA corpus’, In: findings of the association for computational linguistics: EMNLP 2021, Moens, M.-F., Huang, X. Specia, L., and Yih, S. W., Eds., Punta Cana, Dominican Republic: association for computational linguistics, pp. 2929–2940. (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.250.

  49. Volodina, E., Mohammed, Y. A., and Klezl, J.: DaLAJ – a dataset for linguistic acceptability judgments for Swedish’, In: Proceedings of the 10th workshop on NLP for computer assisted language learning, online: LiU Electronic Press, pp. 28–37 (2021). Accessed: Jun. 08, 2023. [Online]. Available: https://aclanthology.org/2021.nlp4call-1.3

  50. Mikhailov, V., Shamardina, T., Ryabinin, M., Pestova, A., Smurov, I., and Artemova, E.: RuCoLA: Russian corpus of linguistic acceptability, In: proceedings of the 2022 conference on empirical methods in natural language processing, Abu Dhabi, United Arab Emirates: association for computational linguistics, pp. 5207–5227 (2022). Accessed: Jun. 08, 2023. [Online]. Available: https://aclanthology.org/2022.emnlp-main.348

  51. Jentoft, M., and Samuel, D.: ‘NoCoLA: The norwegian corpus of linguistic acceptability’, In: proceedings of the 24th nordic conference on computational linguistics (NoDaLiDa), Tórshavn, Faroe Islands: University of Tartu Library, pp. 610–617 (2023). Accessed: Jun. 08, 2023. [Online]. Available: https://aclanthology.org/2023.nodalida-1.60

  52. Lin, T., Wang, Y., Liu, X., Qiu, X.: A survey of transformers. AI Open 3, 111–132 (2022). https://doi.org/10.1016/j.aiopen.2022.10.001

    Article  Google Scholar 

  53. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach’, Jul. 26, 2019, arXiv: arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692.

  54. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. ALBERT: A lite BERT for self-supervised learning of language representations, Feb. 08, 2020, arXiv: arXiv:1909.11942. Accessed: Jun. 08, 2023. [Online]. Available: http://arxiv.org/abs/1909.11942

  55. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V.: XLNet: generalized autoregressive pretraining for language understanding’, In: advances in neural information processing systems, curran associates, Inc., 2019. Accessed: Dec. 21, 2023. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2019/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html

  56. Tsai, Y.-T., Yang, M.-C., and Chen, H.-Y.: Adversarial attack on sentiment classification’, In: proceedings of the 2019 ACL workshop BlackboxNLP: analyzing and interpreting neural networks for NLP, Florence, Italy: association for computational linguistics, pp. 233–240 (2019). https://doi.org/10.18653/v1/W19-4824.

  57. Kozik, R., Ficco, M., Pawlicka, A., Pawlicki, M., Palmieri, F., Choraś, M.: When explainability turns into a threat - using xAI to fool a fake news detection method. Comput. Secur. 137, 103599 (2024). https://doi.org/10.1016/j.cose.2023.103599

    Article  Google Scholar 

  58. Grolman, E., Binyamini, H., Shabtai, A., Elovici, Y., Morikawa, I., and Shimizu, T.: hateversarial: adversarial attack against hate speech detection algorithms on twitter’, In: Proceedings of the 30th ACM conference on user modeling, adaptation and Personalization, in UMAP ’22. New York, NY, USA: Association for computing machinery, pp. 143–152 (2022). https://doi.org/10.1145/3503252.3531309.

  59. Luo, Y., Li, Y., Wen, D., and Lan, L.: Message injection attack on rumor detection under the black-box evasion setting using large language model’, In: proceedings of the ACM on web conference 2024, in WWW ’24. New York, NY, USA: association for computing machinery, pp. 4512–4522 (2024). https://doi.org/10.1145/3589334.3648139.

  60. Nguyen, P. T., Di Sipio, C., Di Rocco, J., Di Penta, M. and Di Ruscio, D.: ‘Adversarial attacks to API recommender systems: time to wake up and smell the coffee? In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 253–265 (2021). https://doi.org/10.1109/ASE51524.2021.9678946.

Download references

Author information

Authors and Affiliations

Authors

Contributions

Sajal Aggarwal, Ashish Bajaj: Software, Validation, Investigation, Data Curation, Writing – Original Draft, Visualization, Conceptualization, Methodology Dinesh Kumar Vishwakarma: Formal Analysis, Resources, Writing – Review & Editing, Supervision.

Corresponding author

Correspondence to Dinesh Kumar Vishwakarma.

Ethics declarations

Conflict of interest

The authors assert that they do not possess any apparent financial conflicts of interest or personal affiliations that may have influenced the findings presented in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aggarwal, S., Bajaj, A. & Vishwakarma, D.K. HOMOGRAPH: a novel textual adversarial attack architecture to unmask the susceptibility of linguistic acceptability classifiers. Int. J. Inf. Secur. 24, 6 (2025). https://doi.org/10.1007/s10207-024-00925-w

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10207-024-00925-w

Keywords

Navigation