Abstract
The vast majority of online media rely heavily on the revenues generated by their readers’ views, and due to the abundance of such outlets, they must compete for reader attention. It is a common practise for publishers to employ attention-grabbing headlines as a means to entice users to visit their websites. These headlines, commonly referred to as clickbaits, strategically leverage the curiosity gap experienced by users, enticing them to click on hyperlinks that frequently fail to meet their expectations. Therefore, the identification of clickbaits is a significant NLP application. Previous studies have demonstrated that language models can effectively detect clickbaits. Deep learning models have attained great success in text-based assignments, but these are vulnerable to adversarial modifications. These attacks involve making undetectable alterations to a small number of words or characters in order to create a deceptive text that misleads the machine into making incorrect predictions. The present work introduces “Non-Alpha-Num”, a newly proposed textual adversarial assault that functions in a black box setting, operating at the character level. The primary goal is to manipulate a certain NLP model in a manner that the alterations made to the input data are undetectable by human observers. A series of comprehensive tests were conducted to evaluate the efficacy of the suggested attack approach on several widely-used models, including Word-CNN, BERT, DistilBERT, ALBERTA, RoBERTa, and XLNet. These models were fine-tuned using the clickbait dataset, which is commonly employed for clickbait detection purposes. The empirical evidence suggests that the attack model being offered routinely achieves much higher attack success rates (ASR) and produces high-quality adversarial instances in comparison to traditional adversarial manipulations. The findings suggest that the clickbait detection system has the potential to be circumvented, which might have significant implications for current policy efforts.
Similar content being viewed by others
Data availability
The data utilized to substantiate the conclusions of this study may be obtained by contacting the relevant author.
References
Bajaj, A., Vishwakarma, D.K.: A state-of-the-art review on adversarial machine learning in image classification. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-15883-z
Bajaj, A., Vishwakarma, D.K.: Bypassing deep learning based sentiment analysis from business reviews. In: 2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN), IEEE, May 2023, pp. 1–6. https://doi.org/10.1109/ViTECoN58111.2023.10157098
Bajaj, A., Kumar Vishwakarma, D.: Evading text based emotion detection mechanism via adversarial attacks. Neurocomputing 558, 126787 (2023). https://doi.org/10.1016/j.neucom.2023.126787
Goyal, S., Doddapaneni, S., Khapra, M.M., Ravindran, B.: A survey of adversarial defences and robustness in NLP. ACM Comput. Surv. (2023). https://doi.org/10.1145/3593042
Yerlikaya, F.A., Bahtiyar, Ş: Data poisoning attacks against machine learning algorithms. Expert Syst. Appl. (2022). https://doi.org/10.1016/j.eswa.2022.118101
Machado, G.R., Silva, E., Goldschmidt, R.R.: Adversarial machine learning in image classification: a survey toward the defender’s perspective. ACM Comput. Surv. 55(1), 1–38 (2023). https://doi.org/10.1145/3485133
Wang, W., Wang, R., Wang, L., Wang, Z., Ye, A.: Towards a robust deep neural network against adversarial texts: a survey. IEEE Trans. Knowl. Data Eng. (2023). https://doi.org/10.1109/TKDE.2021.3117608
Potthast, M., et al.: Crowdsourcing a large corpus of clickbait on Twitter. In: COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings (2018)
Agrawal, A.: Clickbait detection using deep learning. In: Proceedings on 2016 2nd International Conference on Next Generation Computing Technologies, NGCT 2016 (2017). https://doi.org/10.1109/NGCT.2016.7877426
Fakhruzzaman, M.N., Jannah, S.Z., Ningrum, R.A., Fahmiyah, I.: Flagging clickbait in Indonesian online news websites using fine-tuned transformers. Int. J. Electr. Comput. Eng. (2023). https://doi.org/10.11591/ijece.v13i3.pp2921-2930
Al-Sarem, M., et al.: An improved multiple features and machine learning-based approach for detecting clickbait news on social networks. Appl. Sci. (Switzerland) (2021). https://doi.org/10.3390/app11209487
Pujahari, A., Sisodia, D.S.: Clickbait detection using multiple categorisation techniques. J. Inf. Sci. (2021). https://doi.org/10.1177/0165551519871822
Uddin Rony, M.M., Hassan, N., Yousuf, M.: Diving deep into clickbaits: Who use them to what extents in which topics with what effects? In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017 (2017). https://doi.org/10.1145/3110025.3110054
Kaur, S., Kumar, P., Kumaraguru, P.: Detecting clickbaits using two-phase hybrid CNN-LSTM biterm model. Expert Syst. Appl. (2020). https://doi.org/10.1016/j.eswa.2020.113350
Bajaj, A., Vishwakarma, D.K.: Exposing the vulnerabilities of deep learning models in news classification. In: 2023 4th International Conference on Innovative Trends in Information Technology (ICITIIT), IEEE, pp. 1–5 (2023). https://doi.org/10.1109/ICITIIT57246.2023.10068577
Qiu, S., Liu, Q., Zhou, S., Huang, W.: Adversarial attack and defense technologies in natural language processing: a survey. Neurocomputing (2022)
Ha, T., Dang, T.K., Le, H., Truong, T.A.: Security and privacy issues in deep learning: a brief review. SN Comput. Sci. (2020). https://doi.org/10.1007/s42979-020-00254-4
Miller, D.J., Xiang, Z., Kesidis, G.: Adversarial learning targeting deep neural network classification: a comprehensive review of defenses against attacks. Proc. IEEE (2020). https://doi.org/10.1109/JPROC.2020.2970615
Liu, J., et al.: Aliasing black box adversarial attack with joint self-attention distribution and confidence probability. Expert Syst. Appl. (2023). https://doi.org/10.1016/j.eswa.2022.119110
Li, A., Zhang, F., Li, S., Chen, T., Su, P., Wang, H.: Efficiently generating sentence-level textual adversarial examples with Seq2seq Stacked Auto-Encoder. Expert Syst. Appl. (2023). https://doi.org/10.1016/j.eswa.2022.119170
Morris, J.X., Lifland, E., Yoo, J.Y., Qi, Y.: TextAttack: a framework for adversarial attacks in natural language processing. ArXiv, pp. 119–126 (2020)
Yoo, J.Y., Morris, J.X., Lifland, E., Qi, Y.: Searching for a search method: benchmarking search algorithms for generating NLP adversarial examples. In: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 323–332. [Online] (2020). Available: https://github.com/QData/TextAttack
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8018–8025. [Online] (2019). Available: http://arxiv.org/abs/1907.11932
Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. In: 26th Annual Network and Distributed System Security Symposium, pp. 1–15 (2019). https://doi.org/10.14722/ndss.2019.23138
Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: ACL 2019—57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/p19-1103
Zang, Y., et al.: Word-level textual adversarial attacking as combinatorial optimization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6067–6080 (2020). https://doi.org/10.18653/v1/2020.acl-main.540
Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition. In: ACL 2019—57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/p19-1561
Kuleshov, V., Thakoor, S., Lau, T., Ermon, S.: Adversarial examples for natural language classification problems. In: ICLR 2018: International Conference on Learning Representations (2018)
Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level. ArXiv (2019)
Gao, J., Lanchantin, J., Lou Soffa, M., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: Proceedings—2018 IEEE Symposium on Security and Privacy Workshops, SPW 2018, pp. 1–21 (2018). https://doi.org/10.1109/SPW.2018.00016
Ribeiro, M.T., Wu, T., Guestrin, C., Singh, S., Beyond accuracy: behavioral testing of NLP models with CheckList. In: ACL 2020—58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), pp. 4902–4912 (2020)
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6174–6181 (2020)
Yoo, J.Y., Qi, Y.: Towards improving adversarial training of NLP models. In: Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.81
Iyyer, M., Wieting, J., Gimpel, K., Zettlemoyer, L.: Adversarial example generation with syntactically controlled paraphrase networks. In: NAACL HLT 2018—2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference (2018). https://doi.org/10.18653/v1/n18-1170
Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep text classification can be fooled. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 4208–4215 (2018). https://doi.org/10.24963/ijcai.2018/585
Cer, D., et al.: Universal sentence encoder (2018), [Online]. Available: http://arxiv.org/abs/1803.11175
Naber, D., Kummert, P.F., Fakultät, T., Witt, A.: A rule-based style and grammar checker, Technische Fakultät, Universität Bielefeld. (2003), [online]. https://www.danielnaber.de/languagetool/download/style_and_grammar_checker.pdf. Accessed 10 May 2024
Chakraborty, A., Paranjape, B., Kakarla, S., Ganguly, N.: Stop clickbait: detecting and preventing clickbaits in online news media. In: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016 (2016). https://doi.org/10.1109/ASONAM.2016.7752207
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014—2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2014). https://doi.org/10.3115/v1/d14-1181
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019—2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference (2019)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, pp. 2–6 (2019)
Lan, Z., et al.: Albert: a lite bert for self-supervised learning of language representations. In: International Conference on Learning Representations (ICLR), pp. 1–17 (2020). [Online]. Available: https://github.com/google-research/ALBERT
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. In: International Conference on Learning Representations (ICLR), pp. 1–15 (2019). [Online]. Available: http://arxiv.org/abs/1907.11692
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: Generalized autoregressive pretraining for language understanding. In: 33rd conference on neural information processing systems (NeurIPS 2019). Vancouver, Canada, pp 1–11 (2019)
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. In: EMNLP 2020—2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/2020.emnlp-main.498
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for NLP. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Short Papers), pp 31–36 (2018)
Feng, S., Wallace, E., Grissom, A., Iyyer, M., Rodriguez, P., Boyd-Graber, J.: Pathologies of neural models make interpretations difficult. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (2018). https://doi.org/10.18653/v1/d18-1407
Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2019). https://doi.org/10.1109/CVPR.2019.00284
Zhang, J., Peng, W., Wang, R., Lin, Y., Zhou, W., Lan, G.: Enhance domain-invariant transferability of adversarial examples via distance metric attack. Mathematics (2022). https://doi.org/10.3390/math10081249
Ribeiro, M.T., Singh, S., Guestrin, C.: ‘Why should I trust you?’ Explaining the predictions of any classifier. In: NAACL-HLT 2016—2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Demonstrations Session (2016). https://doi.org/10.18653/v1/n16-3020
Patwardhan, N., Marrone, S., Sansone, C.: Transformers in the real world: a survey on NLP applications. Information (Switzerland) (2023). https://doi.org/10.3390/info14040242
He, L., Ai, Q., Yang, X., Ren, Y., Wang, Q., Xu, Z.: Boosting adversarial robustness via self-paced adversarial training. Neural Netw. (2023). https://doi.org/10.1016/j.neunet.2023.08.063
Acknowledgements
The present study was not given any explicit financial support from governmental, corporate, or philanthropic organizations.
Author information
Authors and Affiliations
Contributions
Ashish Bajaj: Software, Validation, Investigation, Data Curation, Writing – Original Draft, Visualization, Conceptualization, Methodology Dinesh Kumar Vishwakarma: Formal Analysis, Resources, Writing – Review & Editing, Supervision, Project Administration, Funding Acquisition.
Corresponding author
Ethics declarations
Conflict of interest
The contributors affirm that they do not own any identifiable conflicting financial interests or personal affiliations that may potentially be interpreted as exerting an impact on the research conducted in this investigation.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bajaj, A., Vishwakarma, D.K. Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms. Int. J. Inf. Secur. 23, 2711–2737 (2024). https://doi.org/10.1007/s10207-024-00861-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-024-00861-9