Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms

Bajaj, Ashish; Vishwakarma, Dinesh Kumar

doi:10.1007/s10207-024-00861-9

Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms

Regular Contribution
Published: 13 May 2024

Volume 23, pages 2711–2737, (2024)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Ashish Bajaj¹ &
Dinesh Kumar Vishwakarma¹

255 Accesses
Explore all metrics

Abstract

The vast majority of online media rely heavily on the revenues generated by their readers’ views, and due to the abundance of such outlets, they must compete for reader attention. It is a common practise for publishers to employ attention-grabbing headlines as a means to entice users to visit their websites. These headlines, commonly referred to as clickbaits, strategically leverage the curiosity gap experienced by users, enticing them to click on hyperlinks that frequently fail to meet their expectations. Therefore, the identification of clickbaits is a significant NLP application. Previous studies have demonstrated that language models can effectively detect clickbaits. Deep learning models have attained great success in text-based assignments, but these are vulnerable to adversarial modifications. These attacks involve making undetectable alterations to a small number of words or characters in order to create a deceptive text that misleads the machine into making incorrect predictions. The present work introduces “Non-Alpha-Num”, a newly proposed textual adversarial assault that functions in a black box setting, operating at the character level. The primary goal is to manipulate a certain NLP model in a manner that the alterations made to the input data are undetectable by human observers. A series of comprehensive tests were conducted to evaluate the efficacy of the suggested attack approach on several widely-used models, including Word-CNN, BERT, DistilBERT, ALBERTA, RoBERTa, and XLNet. These models were fine-tuned using the clickbait dataset, which is commonly employed for clickbait detection purposes. The empirical evidence suggests that the attack model being offered routinely achieves much higher attack success rates (ASR) and produces high-quality adversarial instances in comparison to traditional adversarial manipulations. The findings suggest that the clickbait detection system has the potential to be circumvented, which might have significant implications for current policy efforts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Targeting the Most Important Words Across the Entire Corpus in NLP Adversarial Attacks

An Improved Generation Method of Adversarial Example to Deceive NLP Deep Learning Classifiers

Learning to Generate Textual Adversarial Examples

Data availability

The data utilized to substantiate the conclusions of this study may be obtained by contacting the relevant author.

References

Bajaj, A., Vishwakarma, D.K.: A state-of-the-art review on adversarial machine learning in image classification. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-15883-z
Article Google Scholar
Bajaj, A., Vishwakarma, D.K.: Bypassing deep learning based sentiment analysis from business reviews. In: 2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN), IEEE, May 2023, pp. 1–6. https://doi.org/10.1109/ViTECoN58111.2023.10157098
Bajaj, A., Kumar Vishwakarma, D.: Evading text based emotion detection mechanism via adversarial attacks. Neurocomputing 558, 126787 (2023). https://doi.org/10.1016/j.neucom.2023.126787
Article Google Scholar
Goyal, S., Doddapaneni, S., Khapra, M.M., Ravindran, B.: A survey of adversarial defences and robustness in NLP. ACM Comput. Surv. (2023). https://doi.org/10.1145/3593042
Article Google Scholar
Yerlikaya, F.A., Bahtiyar, Ş: Data poisoning attacks against machine learning algorithms. Expert Syst. Appl. (2022). https://doi.org/10.1016/j.eswa.2022.118101
Article Google Scholar
Machado, G.R., Silva, E., Goldschmidt, R.R.: Adversarial machine learning in image classification: a survey toward the defender’s perspective. ACM Comput. Surv. 55(1), 1–38 (2023). https://doi.org/10.1145/3485133
Article Google Scholar
Wang, W., Wang, R., Wang, L., Wang, Z., Ye, A.: Towards a robust deep neural network against adversarial texts: a survey. IEEE Trans. Knowl. Data Eng. (2023). https://doi.org/10.1109/TKDE.2021.3117608
Article Google Scholar
Potthast, M., et al.: Crowdsourcing a large corpus of clickbait on Twitter. In: COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings (2018)
Agrawal, A.: Clickbait detection using deep learning. In: Proceedings on 2016 2nd International Conference on Next Generation Computing Technologies, NGCT 2016 (2017). https://doi.org/10.1109/NGCT.2016.7877426
Fakhruzzaman, M.N., Jannah, S.Z., Ningrum, R.A., Fahmiyah, I.: Flagging clickbait in Indonesian online news websites using fine-tuned transformers. Int. J. Electr. Comput. Eng. (2023). https://doi.org/10.11591/ijece.v13i3.pp2921-2930
Article Google Scholar
Al-Sarem, M., et al.: An improved multiple features and machine learning-based approach for detecting clickbait news on social networks. Appl. Sci. (Switzerland) (2021). https://doi.org/10.3390/app11209487
Article Google Scholar
Pujahari, A., Sisodia, D.S.: Clickbait detection using multiple categorisation techniques. J. Inf. Sci. (2021). https://doi.org/10.1177/0165551519871822
Article Google Scholar
Uddin Rony, M.M., Hassan, N., Yousuf, M.: Diving deep into clickbaits: Who use them to what extents in which topics with what effects? In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017 (2017). https://doi.org/10.1145/3110025.3110054
Kaur, S., Kumar, P., Kumaraguru, P.: Detecting clickbaits using two-phase hybrid CNN-LSTM biterm model. Expert Syst. Appl. (2020). https://doi.org/10.1016/j.eswa.2020.113350
Article Google Scholar
Bajaj, A., Vishwakarma, D.K.: Exposing the vulnerabilities of deep learning models in news classification. In: 2023 4th International Conference on Innovative Trends in Information Technology (ICITIIT), IEEE, pp. 1–5 (2023). https://doi.org/10.1109/ICITIIT57246.2023.10068577
Qiu, S., Liu, Q., Zhou, S., Huang, W.: Adversarial attack and defense technologies in natural language processing: a survey. Neurocomputing (2022)
Ha, T., Dang, T.K., Le, H., Truong, T.A.: Security and privacy issues in deep learning: a brief review. SN Comput. Sci. (2020). https://doi.org/10.1007/s42979-020-00254-4
Article Google Scholar
Miller, D.J., Xiang, Z., Kesidis, G.: Adversarial learning targeting deep neural network classification: a comprehensive review of defenses against attacks. Proc. IEEE (2020). https://doi.org/10.1109/JPROC.2020.2970615
Article Google Scholar
Liu, J., et al.: Aliasing black box adversarial attack with joint self-attention distribution and confidence probability. Expert Syst. Appl. (2023). https://doi.org/10.1016/j.eswa.2022.119110
Article Google Scholar
Li, A., Zhang, F., Li, S., Chen, T., Su, P., Wang, H.: Efficiently generating sentence-level textual adversarial examples with Seq2seq Stacked Auto-Encoder. Expert Syst. Appl. (2023). https://doi.org/10.1016/j.eswa.2022.119170
Article Google Scholar
Morris, J.X., Lifland, E., Yoo, J.Y., Qi, Y.: TextAttack: a framework for adversarial attacks in natural language processing. ArXiv, pp. 119–126 (2020)
Yoo, J.Y., Morris, J.X., Lifland, E., Qi, Y.: Searching for a search method: benchmarking search algorithms for generating NLP adversarial examples. In: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 323–332. [Online] (2020). Available: https://github.com/QData/TextAttack
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8018–8025. [Online] (2019). Available: http://arxiv.org/abs/1907.11932
Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. In: 26th Annual Network and Distributed System Security Symposium, pp. 1–15 (2019). https://doi.org/10.14722/ndss.2019.23138
Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: ACL 2019—57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/p19-1103
Zang, Y., et al.: Word-level textual adversarial attacking as combinatorial optimization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6067–6080 (2020). https://doi.org/10.18653/v1/2020.acl-main.540
Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition. In: ACL 2019—57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/p19-1561
Kuleshov, V., Thakoor, S., Lau, T., Ermon, S.: Adversarial examples for natural language classification problems. In: ICLR 2018: International Conference on Learning Representations (2018)
Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level. ArXiv (2019)
Gao, J., Lanchantin, J., Lou Soffa, M., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: Proceedings—2018 IEEE Symposium on Security and Privacy Workshops, SPW 2018, pp. 1–21 (2018). https://doi.org/10.1109/SPW.2018.00016
Ribeiro, M.T., Wu, T., Guestrin, C., Singh, S., Beyond accuracy: behavioral testing of NLP models with CheckList. In: ACL 2020—58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), pp. 4902–4912 (2020)
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6174–6181 (2020)
Yoo, J.Y., Qi, Y.: Towards improving adversarial training of NLP models. In: Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.81
Iyyer, M., Wieting, J., Gimpel, K., Zettlemoyer, L.: Adversarial example generation with syntactically controlled paraphrase networks. In: NAACL HLT 2018—2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference (2018). https://doi.org/10.18653/v1/n18-1170
Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep text classification can be fooled. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 4208–4215 (2018). https://doi.org/10.24963/ijcai.2018/585
Cer, D., et al.: Universal sentence encoder (2018), [Online]. Available: http://arxiv.org/abs/1803.11175
Naber, D., Kummert, P.F., Fakultät, T., Witt, A.: A rule-based style and grammar checker, Technische Fakultät, Universität Bielefeld. (2003), [online]. https://www.danielnaber.de/languagetool/download/style_and_grammar_checker.pdf. Accessed 10 May 2024
Chakraborty, A., Paranjape, B., Kakarla, S., Ganguly, N.: Stop clickbait: detecting and preventing clickbaits in online news media. In: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016 (2016). https://doi.org/10.1109/ASONAM.2016.7752207
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014—2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2014). https://doi.org/10.3115/v1/d14-1181
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019—2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference (2019)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, pp. 2–6 (2019)
Lan, Z., et al.: Albert: a lite bert for self-supervised learning of language representations. In: International Conference on Learning Representations (ICLR), pp. 1–17 (2020). [Online]. Available: https://github.com/google-research/ALBERT
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. In: International Conference on Learning Representations (ICLR), pp. 1–15 (2019). [Online]. Available: http://arxiv.org/abs/1907.11692
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: Generalized autoregressive pretraining for language understanding. In: 33rd conference on neural information processing systems (NeurIPS 2019). Vancouver, Canada, pp 1–11 (2019)
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. In: EMNLP 2020—2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/2020.emnlp-main.498
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for NLP. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Short Papers), pp 31–36 (2018)
Feng, S., Wallace, E., Grissom, A., Iyyer, M., Rodriguez, P., Boyd-Graber, J.: Pathologies of neural models make interpretations difficult. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (2018). https://doi.org/10.18653/v1/d18-1407
Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2019). https://doi.org/10.1109/CVPR.2019.00284
Zhang, J., Peng, W., Wang, R., Lin, Y., Zhou, W., Lan, G.: Enhance domain-invariant transferability of adversarial examples via distance metric attack. Mathematics (2022). https://doi.org/10.3390/math10081249
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: ‘Why should I trust you?’ Explaining the predictions of any classifier. In: NAACL-HLT 2016—2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Demonstrations Session (2016). https://doi.org/10.18653/v1/n16-3020
Patwardhan, N., Marrone, S., Sansone, C.: Transformers in the real world: a survey on NLP applications. Information (Switzerland) (2023). https://doi.org/10.3390/info14040242
Article Google Scholar
He, L., Ai, Q., Yang, X., Ren, Y., Wang, Q., Xu, Z.: Boosting adversarial robustness via self-paced adversarial training. Neural Netw. (2023). https://doi.org/10.1016/j.neunet.2023.08.063
Article Google Scholar

Download references

Acknowledgements

The present study was not given any explicit financial support from governmental, corporate, or philanthropic organizations.

Author information

Authors and Affiliations

Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Bawana Road, New Delhi, Delhi, 110042, India
Ashish Bajaj & Dinesh Kumar Vishwakarma

Authors

Ashish Bajaj
View author publications
You can also search for this author inPubMed Google Scholar
Dinesh Kumar Vishwakarma
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Ashish Bajaj: Software, Validation, Investigation, Data Curation, Writing – Original Draft, Visualization, Conceptualization, Methodology Dinesh Kumar Vishwakarma: Formal Analysis, Resources, Writing – Review & Editing, Supervision, Project Administration, Funding Acquisition.

Corresponding author

Correspondence to Dinesh Kumar Vishwakarma.

Ethics declarations

Conflict of interest

The contributors affirm that they do not own any identifiable conflicting financial interests or personal affiliations that may potentially be interpreted as exerting an impact on the research conducted in this investigation.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bajaj, A., Vishwakarma, D.K. Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms. Int. J. Inf. Secur. 23, 2711–2737 (2024). https://doi.org/10.1007/s10207-024-00861-9

Download citation

Accepted: 23 April 2024
Published: 13 May 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s10207-024-00861-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Targeting the Most Important Words Across the Entire Corpus in NLP Adversarial Attacks

An Improved Generation Method of Adversarial Example to Deceive NLP Deep Learning Classifiers

Learning to Generate Textual Adversarial Examples

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now