HOMOGRAPH: a novel textual adversarial attack architecture to unmask the susceptibility of linguistic acceptability classifiers

Aggarwal, Sajal; Bajaj, Ashish; Vishwakarma, Dinesh Kumar

doi:10.1007/s10207-024-00925-w

HOMOGRAPH: a novel textual adversarial attack architecture to unmask the susceptibility of linguistic acceptability classifiers

Regular Contribution
Published: 30 October 2024

Volume 24, article number 6, (2025)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Sajal Aggarwal¹,
Ashish Bajaj¹ &
Dinesh Kumar Vishwakarma¹

150 Accesses
Explore all metrics

Abstract

The increasing adoption of deep learning algorithms for automating downstream natural language processing (NLP) tasks has created a need to enhance their capability to assess linguistic acceptability. The CoLA corpus was created to aid in the development of models that can accurately assess grammatical acceptability and evaluate linguistic proficiency. Transformer models, widely utilized in various natural language processing tasks, including the evaluation of linguistic acceptability, may possess limitations that undermine their perceived robustness. These models exhibit susceptibility to adversarial text attacks, which are characterized by inconspicuous modifications made to the original input text. The tactfully chosen modifications are such that the adversarial examples generated, although correctly classified by human observers, successfully mislead the targeted model of the attack, consequently hindering its reliability. This paper presents a novel framework called ‘Homograph’ to generate adversarial text in a black-box setting. The efficacy of the suggested attack in undermining models designed for linguistic acceptability is significantly enhanced by its capability to generate visually similar adversarial examples that do not compromise the grammatical acceptability of the original input samples. These examples effectively deceive the model, causing it to modify its predicted label. In the context of the linguistic acceptability task, our attack was effectively applied to five transformer models: ALBERT, BERT, DistilBERT, RoBERTa, and XL-Net, fine-tuned on the CoLA dataset. Our work distinguishes itself from existing text-based attacks through several contributions. Firstly, we surpass previous baselines in terms of attack success rate ($ASR$) and average perturbation rate ($APR$) for models trained on the CoLA dataset. Secondly, we generate more potent adversarial examples that contain imperceptible modifications, thereby preserving the original label. Lastly, we employ a straightforward character-level transformation technique to produce adversarial examples that closely resemble the original text.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms

Article 13 May 2024

Reversible jump attack to textual classifiers with modification reduction

Article Open access 22 April 2024

Autocue : Targeted Textual Adversarial Attacks with Adversarial Prompts

Data availability

The data used to support the conclusions of this study are available upon request from the corresponding author.

References

Sarker, I.H.: Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput. Sci. 2(6), 420 (2021). https://doi.org/10.1007/s42979-021-00815-1
Article Google Scholar
Aggarwal, S., Bhola, G., Vishwakarma, D.K.: Weighted voting ensemble of hybrid CNN-LSTM Models for vision-based human activity recognition. Multimed. Tools Appl. (2024). https://doi.org/10.1007/s11042-024-19582-1
Article Google Scholar
Aggarwal, S., Pandey, A., and Vishwakarma, K. D.: ‘Multimodal sarcasm recognition by fusing textual, visual and acoustic content via multi-headed attention for video dataset’, in 2023 world conference on communication & computing (WCONF), pp. 1–5. (2023). https://doi.org/10.1109/WCONF58270.2023.10235179.
Goodfellow, I. J., Shlens, J., and Szegedy, C.: Explaining and harnessing adversarial examples’, arXiv: arXiv:1412.6572. (2015). https://doi.org/10.48550/arXiv.1412.6572.
Moosavi-Dezfooli, S.-M., Fawzi, A. and Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks’, In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA: IEEE, pp. 2574–2582. (2016). https://doi.org/10.1109/CVPR.2016.282.
Modas, A., Moosavi-Dezfooli, S.-M., and Frossard, P., SparseFool: a few pixels make a big difference, In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 9079–9088. (2019). https://doi.org/10.1109/CVPR.2019.00930.
Aggarwal, S., Vishwakarma, D.K.: Exposing the achilles’ heel of textual hate speech classifiers using indistinguishable adversarial examples. Expert Syst. Appl. 254, 124278 (2024). https://doi.org/10.1016/j.eswa.2024.124278
Article Google Scholar
Peng, H., Wang, Z., Wei, C., Zhao, D., Guangquan, X., Han, J., Guo, S., Zhong, M., Ji, S.: TextJuggler: fooling text classification tasks by generating high-quality adversarial examples. Knowledge-Based Syst. 300, 112188 (2024). https://doi.org/10.1016/j.knosys.2024.112188
Article Google Scholar
Warstadt, A., Singh, A. and Bowman, S. R.: Neural network acceptability judgments, arXiv: arXiv:1805.12471 (2019). https://doi.org/10.48550/arXiv.1805.12471.
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S.: ‘GLUE: A multi-task benchmark and analysis platform for natural language understanding’, In proceedings of the 2018 EMNLP workshop blackboxnlp: analyzing and interpreting neural networks for NLP, T. Linzen, G. Chrupała, and A. Alishahi, Eds., Brussels, Belgium: association for computational linguistics, pp. 353–355 (2018). https://doi.org/10.18653/v1/W18-5446.
Zhu, Q.: On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recognit. Lett. 136, 71–80 (2020). https://doi.org/10.1016/j.patrec.2020.03.030
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding’, in proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota: Association for Computational Linguistics, pp. 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423.
Sanh, V., Debut, L., Chaumond, J., and Wolf,T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, Feb. 29, 2020, arXiv: arXiv:1910.01108. Accessed: Jun. 06, 2023. [Online]. Available: http://arxiv.org/abs/1910.01108
Islam, S., et al.: A comprehensive survey on applications of transformers for deep learning tasks. Expert Syst. Appl. 241, 122666 (2024). https://doi.org/10.1016/j.eswa.2023.122666
Article Google Scholar
Aggarwal, S., and Vishwakarma, D. K.: Protecting our children from the dark corners of youtube: a cutting-edge analysis’, In: 2023 4th IEEE global conference for advancement in technology (GCAT), pp. 1–5 (2023). https://doi.org/10.1109/GCAT59970.2023.10353306.
Habbat, N., Nouri, H., Anoun, H., Hassouni, L.: Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning. Eng. Appl. Artif. Intell. 126, 106999 (2023). https://doi.org/10.1016/j.engappai.2023.106999
Article Google Scholar
Liu, T., Ke, Y., Wang, L., Zhang, X., Zhou, H., Xiaofei, W.: Clickbait detection on WeChat: a deep model integrating semantic and syntactic information. Knowl.-Based Syst. 245, 108605 (2022). https://doi.org/10.1016/j.knosys.2022.108605
Article Google Scholar
Almerekhi, H., Kwak, H., Salminen, J., Jansen, B.J.: PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits. Data Inf. Manag. 6(4), 100019 (2022). https://doi.org/10.1016/j.dim.2022.100019
Article Google Scholar
Formento, B., Foo, C. S., Tuan, L. A., and Ng, S. K.: Using punctuation as an adversarial attack on deep learning-based NLP Systems: an empirical study, In: findings of the association for computational linguistics: EACL 2023, Dubrovnik, Croatia: Association for computational linguistics, pp. 1–34. (2023). Accessed: Sep. 08, 2023. [Online]. Available: https://aclanthology.org/2023.findings-eacl.1
Bajaj, A., Vishwakarma, D.K.: Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms. Int. J. Inf. Secur. (2024). https://doi.org/10.1007/s10207-024-00861-9
Article Google Scholar
Li,J., Ji, S. Du, T., Li, B., and Wang,T.: TextBugger: generating adversarial text against real-world applications’, In proceedings 2019 network and distributed system security symposium, San Diego, CA: Internet society, (2019). https://doi.org/10.14722/ndss.2019.23138.
Liu, J., et al.: Aliasing black box adversarial attack with joint self-attention distribution and confidence probability. Expert Syst. Appl. 214, 119110 (2023). https://doi.org/10.1016/j.eswa.2022.119110
Article Google Scholar
Morris, J., Lifland, E.,Yoo, J. Y., Grigsby, J., Jin, D. and Qi, Y.: TextAttack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP’, In: proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, Online: association for computational linguistics, pp. 119–126. (2020). https://doi.org/10.18653/v1/2020.emnlp-demos.16.
Zang Y., et al.: Word-level textual adversarial attacking as combinatorial optimization’, in proceedings of the 58th annual meeting of the association for computational linguistics, online: association for computational linguistics, pp. 6066–6080. (2020). https://doi.org/10.18653/v1/2020.acl-main.540.
Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.-J., Srivastava, M., and Chang, K.-W.: Generating natural language adversarial examples, arXiv: arXiv:1804.07998 (2018). https://doi.org/10.48550/arXiv.1804.07998.
Jia, R., Raghunathan, A., Göksel, K., and Liang, P.: Certified robustness to adversarial word substitutions’, Sep. 03, 2019, arXiv: arXiv:1909.00986. https://doi.org/10.48550/arXiv.1909.00986.
Wang, X., Jin, H., Yang, Y., and He, K.: Natural language adversarial defense through synonym encoding’, Jun. 14, 2021, arXiv: arXiv:1909.06723. https://doi.org/10.48550/arXiv.1909.06723.
Yoo, J. Y., and Qi, Y.: Towards improving adversarial training of NLP Models’, In: findings of the association for computational linguistics: EMNLP 2021, Punta Cana, dominican republic: association for computational linguistics, pp. 945–956. (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.81.
Garg S., and Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification’, In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), ONLINE: association for computational linguistics, pp. 6174–6181. (2020). https://doi.org/10.18653/v1/2020.emnlp-main.498.
Ribeiro, M. T., Wu, T., Guestrin, C., and Singh, S.: Beyond accuracy: behavioral testing of NLP models with CheckList’, In: Proceedings of the 58th annual meeting of the association for computational linguistics, online: association for computational linguistics, pp. 4902–4912. (2020). https://doi.org/10.18653/v1/2020.acl-main.442.
Gao, J., Lanchantin, J., Soffa, M. L., and Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers’, In 2018 IEEE security and privacy workshops (SPW), pp. 50–56 (2018). https://doi.org/10.1109/SPW.2018.00016.
Ebrahimi, J., Rao, A., Lowd, D., and Dou, D.: HotFlip: white-box adversarial examples for text classification’, In: proceedings of the 56th annual meeting of the association for computational linguistics (Volume 2: Short Papers), Melbourne, Australia: association for computational linguistics, pp. 31–36. (2018). https://doi.org/10.18653/v1/P18-2006.
Kuleshov, V. Thakoor, S., Lau, T., and Ermon, S.: Adversarial examples for natural language classification problems’, Feb. 2018, Accessed: Jul. 24, 2024. [Online]. Available: https://openreview.net/forum?id=r1QZ3zbAZ
Ren, S., Deng, Y., He, K., and Che, W.: Generating natural language adversarial examples through probability weighted word saliency’, In: proceedings of the 57th annual meeting of the association for computational linguistics, Florence, Italy: association for Computational Linguistics, pp. 1085–1097 (2019). https://doi.org/10.18653/v1/P19-1103.
Jin, D., Jin, Z., Zhou, V., and Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment, Proceedings of the AAAI conference on artificial intelligence vol. 34, no. 05, Art. no. 05, (2020), https://doi.org/10.1609/aaai.v34i05.6311
Pruthi, D., Dhingra, B., and Lipton, Z. C.: Combating adversarial misspellings with robust word recognition’, In Proceedings of the 57th annual meeting of the association for computational linguistics, florence, italy: association for computational linguistics, pp. 5582–5591 (2019). https://doi.org/10.18653/v1/P19-1561.
Yang, X., Qi, Y., Chen, H., Liu, B., Liu, W.: Generation-based parallel particle swarm optimization for adversarial text attacks. Inf. Sci. 644, 119237 (2023). https://doi.org/10.1016/j.ins.2023.119237
Article Google Scholar
Dong Z., and Dong, Q.: HowNet - a hybrid language and knowledge resource, In: International conference on natural language processing and knowledge engineering, proceedings. pp. 820–824 (2003). https://doi.org/10.1109/NLPKE.2003.1276017.
Xu, J., Du, Q.: TextTricker: loss-based and gradient-based adversarial attacks on text classification models. Eng. Appl. Artif. Intell. 92, 103641 (2020). https://doi.org/10.1016/j.engappai.2020.103641
Article Google Scholar
Chang, G., Gao, H., Yao, Z., Xiong, H.: TextGuise: adaptive adversarial example attacks on text classification model. Neurocomputing 529, 190–203 (2023). https://doi.org/10.1016/j.neucom.2023.01.071
Article Google Scholar
Charikar, M. S.: ‘Similarity estimation techniques from rounding algorithms’, In: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, in STOC ’02. New York, NY, USA: Association for Computing Machinery, pp. 380–388. (2002). https://doi.org/10.1145/509907.509965.
Liu, Z., et al.: HyGloadAttack: hard-label black-box textual adversarial attacks via hybrid optimization. Neural Netw. 178, 106461 (2024). https://doi.org/10.1016/j.neunet.2024.106461
Article Google Scholar
Han, X., et al.: BFS2Adv: black-box adversarial attack towards hard-to-attack short texts. Comput. Secur. 141, 103817 (2024). https://doi.org/10.1016/j.cose.2024.103817
Article Google Scholar
Chiang, C.-H., and Lee, H.: Are synonym substitution attacks really synonym substitution attacks?’, In: findings of the association for computational linguistics: ACL 2023, Toronto, Canada: association for computational linguistics, pp. 1853–1878. (2023). https://doi.org/10.18653/v1/2023.findings-acl.117.
Kennedy, J., and Eberhart, R.: Particle swarm optimization’, in Proceedings of ICNN’95 - International Conference on Neural Networks, vol. 4 pp. 1942–1948 (1995). https://doi.org/10.1109/ICNN.1995.488968.
Yoo, J. Y., Morris, J., Lifland, E., and Qi, Y.: Searching for a search method: benchmarking search algorithms for generating nlp adversarial examples, in Proceedings of the third BlackboxNLP workshop on analyzing and interpreting neural networks for NLP, Online: association for computational linguistics, pp. 323–332. (2020). https://doi.org/10.18653/v1/2020.blackboxnlp-1.30.
Cer D., et al.: Universal sentence encoder for english’, In: proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, Brussels, Belgium: association for computational linguistics, pp. 169–174. (2018). https://doi.org/10.18653/v1/D18-2029.
Trotta, D., Guarasci, R., Leonardelli, E., and Tonelli, S., Monolingual and cross-lingual acceptability judgments with the Italian CoLA corpus’, In: findings of the association for computational linguistics: EMNLP 2021, Moens, M.-F., Huang, X. Specia, L., and Yih, S. W., Eds., Punta Cana, Dominican Republic: association for computational linguistics, pp. 2929–2940. (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.250.
Volodina, E., Mohammed, Y. A., and Klezl, J.: DaLAJ – a dataset for linguistic acceptability judgments for Swedish’, In: Proceedings of the 10th workshop on NLP for computer assisted language learning, online: LiU Electronic Press, pp. 28–37 (2021). Accessed: Jun. 08, 2023. [Online]. Available: https://aclanthology.org/2021.nlp4call-1.3
Mikhailov, V., Shamardina, T., Ryabinin, M., Pestova, A., Smurov, I., and Artemova, E.: RuCoLA: Russian corpus of linguistic acceptability, In: proceedings of the 2022 conference on empirical methods in natural language processing, Abu Dhabi, United Arab Emirates: association for computational linguistics, pp. 5207–5227 (2022). Accessed: Jun. 08, 2023. [Online]. Available: https://aclanthology.org/2022.emnlp-main.348
Jentoft, M., and Samuel, D.: ‘NoCoLA: The norwegian corpus of linguistic acceptability’, In: proceedings of the 24th nordic conference on computational linguistics (NoDaLiDa), Tórshavn, Faroe Islands: University of Tartu Library, pp. 610–617 (2023). Accessed: Jun. 08, 2023. [Online]. Available: https://aclanthology.org/2023.nodalida-1.60
Lin, T., Wang, Y., Liu, X., Qiu, X.: A survey of transformers. AI Open 3, 111–132 (2022). https://doi.org/10.1016/j.aiopen.2022.10.001
Article Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach’, Jul. 26, 2019, arXiv: arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692.
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. ALBERT: A lite BERT for self-supervised learning of language representations, Feb. 08, 2020, arXiv: arXiv:1909.11942. Accessed: Jun. 08, 2023. [Online]. Available: http://arxiv.org/abs/1909.11942
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V.: XLNet: generalized autoregressive pretraining for language understanding’, In: advances in neural information processing systems, curran associates, Inc., 2019. Accessed: Dec. 21, 2023. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2019/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html
Tsai, Y.-T., Yang, M.-C., and Chen, H.-Y.: Adversarial attack on sentiment classification’, In: proceedings of the 2019 ACL workshop BlackboxNLP: analyzing and interpreting neural networks for NLP, Florence, Italy: association for computational linguistics, pp. 233–240 (2019). https://doi.org/10.18653/v1/W19-4824.
Kozik, R., Ficco, M., Pawlicka, A., Pawlicki, M., Palmieri, F., Choraś, M.: When explainability turns into a threat - using xAI to fool a fake news detection method. Comput. Secur. 137, 103599 (2024). https://doi.org/10.1016/j.cose.2023.103599
Article Google Scholar
Grolman, E., Binyamini, H., Shabtai, A., Elovici, Y., Morikawa, I., and Shimizu, T.: hateversarial: adversarial attack against hate speech detection algorithms on twitter’, In: Proceedings of the 30th ACM conference on user modeling, adaptation and Personalization, in UMAP ’22. New York, NY, USA: Association for computing machinery, pp. 143–152 (2022). https://doi.org/10.1145/3503252.3531309.
Luo, Y., Li, Y., Wen, D., and Lan, L.: Message injection attack on rumor detection under the black-box evasion setting using large language model’, In: proceedings of the ACM on web conference 2024, in WWW ’24. New York, NY, USA: association for computing machinery, pp. 4512–4522 (2024). https://doi.org/10.1145/3589334.3648139.
Nguyen, P. T., Di Sipio, C., Di Rocco, J., Di Penta, M. and Di Ruscio, D.: ‘Adversarial attacks to API recommender systems: time to wake up and smell the coffee? In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 253–265 (2021). https://doi.org/10.1109/ASE51524.2021.9678946.

Download references

Author information

Authors and Affiliations

Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Bawana Road, Delhi, 110042, India
Sajal Aggarwal, Ashish Bajaj & Dinesh Kumar Vishwakarma

Authors

Sajal Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Ashish Bajaj
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh Kumar Vishwakarma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Sajal Aggarwal, Ashish Bajaj: Software, Validation, Investigation, Data Curation, Writing – Original Draft, Visualization, Conceptualization, Methodology Dinesh Kumar Vishwakarma: Formal Analysis, Resources, Writing – Review & Editing, Supervision.

Corresponding author

Correspondence to Dinesh Kumar Vishwakarma.

Ethics declarations

Conflict of interest

The authors assert that they do not possess any apparent financial conflicts of interest or personal affiliations that may have influenced the findings presented in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Aggarwal, S., Bajaj, A. & Vishwakarma, D.K. HOMOGRAPH: a novel textual adversarial attack architecture to unmask the susceptibility of linguistic acceptability classifiers. Int. J. Inf. Secur. 24, 6 (2025). https://doi.org/10.1007/s10207-024-00925-w

Download citation

Accepted: 16 October 2024
Published: 30 October 2024
DOI: https://doi.org/10.1007/s10207-024-00925-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HOMOGRAPH: a novel textual adversarial attack architecture to unmask the susceptibility of linguistic acceptability classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms

Reversible jump attack to textual classifiers with modification reduction

Autocue : Targeted Textual Adversarial Attacks with Adversarial Prompts

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

HOMOGRAPH: a novel textual adversarial attack architecture to unmask the susceptibility of linguistic acceptability classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms

Reversible jump attack to textual classifiers with modification reduction

Autocue : Targeted Textual Adversarial Attacks with Adversarial Prompts

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation