Skip to main content
Log in

Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

The vast majority of online media rely heavily on the revenues generated by their readers’ views, and due to the abundance of such outlets, they must compete for reader attention. It is a common practise for publishers to employ attention-grabbing headlines as a means to entice users to visit their websites. These headlines, commonly referred to as clickbaits, strategically leverage the curiosity gap experienced by users, enticing them to click on hyperlinks that frequently fail to meet their expectations. Therefore, the identification of clickbaits is a significant NLP application. Previous studies have demonstrated that language models can effectively detect clickbaits. Deep learning models have attained great success in text-based assignments, but these are vulnerable to adversarial modifications. These attacks involve making undetectable alterations to a small number of words or characters in order to create a deceptive text that misleads the machine into making incorrect predictions. The present work introduces “Non-Alpha-Num”, a newly proposed textual adversarial assault that functions in a black box setting, operating at the character level. The primary goal is to manipulate a certain NLP model in a manner that the alterations made to the input data are undetectable by human observers. A series of comprehensive tests were conducted to evaluate the efficacy of the suggested attack approach on several widely-used models, including Word-CNN, BERT, DistilBERT, ALBERTA, RoBERTa, and XLNet. These models were fine-tuned using the clickbait dataset, which is commonly employed for clickbait detection purposes. The empirical evidence suggests that the attack model being offered routinely achieves much higher attack success rates (ASR) and produces high-quality adversarial instances in comparison to traditional adversarial manipulations. The findings suggest that the clickbait detection system has the potential to be circumvented, which might have significant implications for current policy efforts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The data utilized to substantiate the conclusions of this study may be obtained by contacting the relevant author.

References

  1. Bajaj, A., Vishwakarma, D.K.: A state-of-the-art review on adversarial machine learning in image classification. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-15883-z

    Article  Google Scholar 

  2. Bajaj, A., Vishwakarma, D.K.: Bypassing deep learning based sentiment analysis from business reviews. In: 2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN), IEEE, May 2023, pp. 1–6. https://doi.org/10.1109/ViTECoN58111.2023.10157098

  3. Bajaj, A., Kumar Vishwakarma, D.: Evading text based emotion detection mechanism via adversarial attacks. Neurocomputing 558, 126787 (2023). https://doi.org/10.1016/j.neucom.2023.126787

    Article  Google Scholar 

  4. Goyal, S., Doddapaneni, S., Khapra, M.M., Ravindran, B.: A survey of adversarial defences and robustness in NLP. ACM Comput. Surv. (2023). https://doi.org/10.1145/3593042

    Article  Google Scholar 

  5. Yerlikaya, F.A., Bahtiyar, Ş: Data poisoning attacks against machine learning algorithms. Expert Syst. Appl. (2022). https://doi.org/10.1016/j.eswa.2022.118101

    Article  Google Scholar 

  6. Machado, G.R., Silva, E., Goldschmidt, R.R.: Adversarial machine learning in image classification: a survey toward the defender’s perspective. ACM Comput. Surv. 55(1), 1–38 (2023). https://doi.org/10.1145/3485133

    Article  Google Scholar 

  7. Wang, W., Wang, R., Wang, L., Wang, Z., Ye, A.: Towards a robust deep neural network against adversarial texts: a survey. IEEE Trans. Knowl. Data Eng. (2023). https://doi.org/10.1109/TKDE.2021.3117608

    Article  Google Scholar 

  8. Potthast, M., et al.: Crowdsourcing a large corpus of clickbait on Twitter. In: COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings (2018)

  9. Agrawal, A.: Clickbait detection using deep learning. In: Proceedings on 2016 2nd International Conference on Next Generation Computing Technologies, NGCT 2016 (2017). https://doi.org/10.1109/NGCT.2016.7877426

  10. Fakhruzzaman, M.N., Jannah, S.Z., Ningrum, R.A., Fahmiyah, I.: Flagging clickbait in Indonesian online news websites using fine-tuned transformers. Int. J. Electr. Comput. Eng. (2023). https://doi.org/10.11591/ijece.v13i3.pp2921-2930

    Article  Google Scholar 

  11. Al-Sarem, M., et al.: An improved multiple features and machine learning-based approach for detecting clickbait news on social networks. Appl. Sci. (Switzerland) (2021). https://doi.org/10.3390/app11209487

    Article  Google Scholar 

  12. Pujahari, A., Sisodia, D.S.: Clickbait detection using multiple categorisation techniques. J. Inf. Sci. (2021). https://doi.org/10.1177/0165551519871822

    Article  Google Scholar 

  13. Uddin Rony, M.M., Hassan, N., Yousuf, M.: Diving deep into clickbaits: Who use them to what extents in which topics with what effects? In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017 (2017). https://doi.org/10.1145/3110025.3110054

  14. Kaur, S., Kumar, P., Kumaraguru, P.: Detecting clickbaits using two-phase hybrid CNN-LSTM biterm model. Expert Syst. Appl. (2020). https://doi.org/10.1016/j.eswa.2020.113350

    Article  Google Scholar 

  15. Bajaj, A., Vishwakarma, D.K.: Exposing the vulnerabilities of deep learning models in news classification. In: 2023 4th International Conference on Innovative Trends in Information Technology (ICITIIT), IEEE, pp. 1–5 (2023). https://doi.org/10.1109/ICITIIT57246.2023.10068577

  16. Qiu, S., Liu, Q., Zhou, S., Huang, W.: Adversarial attack and defense technologies in natural language processing: a survey. Neurocomputing (2022)

  17. Ha, T., Dang, T.K., Le, H., Truong, T.A.: Security and privacy issues in deep learning: a brief review. SN Comput. Sci. (2020). https://doi.org/10.1007/s42979-020-00254-4

    Article  Google Scholar 

  18. Miller, D.J., Xiang, Z., Kesidis, G.: Adversarial learning targeting deep neural network classification: a comprehensive review of defenses against attacks. Proc. IEEE (2020). https://doi.org/10.1109/JPROC.2020.2970615

    Article  Google Scholar 

  19. Liu, J., et al.: Aliasing black box adversarial attack with joint self-attention distribution and confidence probability. Expert Syst. Appl. (2023). https://doi.org/10.1016/j.eswa.2022.119110

    Article  Google Scholar 

  20. Li, A., Zhang, F., Li, S., Chen, T., Su, P., Wang, H.: Efficiently generating sentence-level textual adversarial examples with Seq2seq Stacked Auto-Encoder. Expert Syst. Appl. (2023). https://doi.org/10.1016/j.eswa.2022.119170

    Article  Google Scholar 

  21. Morris, J.X., Lifland, E., Yoo, J.Y., Qi, Y.: TextAttack: a framework for adversarial attacks in natural language processing. ArXiv, pp. 119–126 (2020)

  22. Yoo, J.Y., Morris, J.X., Lifland, E., Qi, Y.: Searching for a search method: benchmarking search algorithms for generating NLP adversarial examples. In: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 323–332. [Online] (2020). Available: https://github.com/QData/TextAttack

  23. Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8018–8025. [Online] (2019). Available: http://arxiv.org/abs/1907.11932

  24. Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. In: 26th Annual Network and Distributed System Security Symposium, pp. 1–15 (2019). https://doi.org/10.14722/ndss.2019.23138

  25. Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: ACL 2019—57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/p19-1103

  26. Zang, Y., et al.: Word-level textual adversarial attacking as combinatorial optimization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6067–6080 (2020). https://doi.org/10.18653/v1/2020.acl-main.540

  27. Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition. In: ACL 2019—57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/p19-1561

  28. Kuleshov, V., Thakoor, S., Lau, T., Ermon, S.: Adversarial examples for natural language classification problems. In: ICLR 2018: International Conference on Learning Representations (2018)

  29. Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level. ArXiv (2019)

  30. Gao, J., Lanchantin, J., Lou Soffa, M., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: Proceedings—2018 IEEE Symposium on Security and Privacy Workshops, SPW 2018, pp. 1–21 (2018). https://doi.org/10.1109/SPW.2018.00016

  31. Ribeiro, M.T., Wu, T., Guestrin, C., Singh, S., Beyond accuracy: behavioral testing of NLP models with CheckList. In: ACL 2020—58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), pp. 4902–4912 (2020)

  32. Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6174–6181 (2020)

  33. Yoo, J.Y., Qi, Y.: Towards improving adversarial training of NLP models. In: Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.81

  34. Iyyer, M., Wieting, J., Gimpel, K., Zettlemoyer, L.: Adversarial example generation with syntactically controlled paraphrase networks. In: NAACL HLT 2018—2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference (2018). https://doi.org/10.18653/v1/n18-1170

  35. Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep text classification can be fooled. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 4208–4215 (2018). https://doi.org/10.24963/ijcai.2018/585

  36. Cer, D., et al.: Universal sentence encoder (2018), [Online]. Available: http://arxiv.org/abs/1803.11175

  37. Naber, D., Kummert, P.F., Fakultät, T., Witt, A.: A rule-based style and grammar checker, Technische Fakultät, Universität Bielefeld. (2003), [online]. https://www.danielnaber.de/languagetool/download/style_and_grammar_checker.pdf. Accessed 10 May 2024

  38. Chakraborty, A., Paranjape, B., Kakarla, S., Ganguly, N.: Stop clickbait: detecting and preventing clickbaits in online news media. In: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016 (2016). https://doi.org/10.1109/ASONAM.2016.7752207

  39. Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014—2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2014). https://doi.org/10.3115/v1/d14-1181

  40. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019—2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference (2019)

  41. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, pp. 2–6 (2019)

  42. Lan, Z., et al.: Albert: a lite bert for self-supervised learning of language representations. In: International Conference on Learning Representations (ICLR), pp. 1–17 (2020). [Online]. Available: https://github.com/google-research/ALBERT

  43. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. In: International Conference on Learning Representations (ICLR), pp. 1–15 (2019). [Online]. Available: http://arxiv.org/abs/1907.11692

  44. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: Generalized autoregressive pretraining for language understanding. In: 33rd conference on neural information processing systems (NeurIPS 2019). Vancouver, Canada, pp 1–11 (2019)

  45. Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. In: EMNLP 2020—2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/2020.emnlp-main.498

  46. Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for NLP. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Short Papers), pp 31–36 (2018)

  47. Feng, S., Wallace, E., Grissom, A., Iyyer, M., Rodriguez, P., Boyd-Graber, J.: Pathologies of neural models make interpretations difficult. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (2018). https://doi.org/10.18653/v1/d18-1407

  48. Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2019). https://doi.org/10.1109/CVPR.2019.00284

  49. Zhang, J., Peng, W., Wang, R., Lin, Y., Zhou, W., Lan, G.: Enhance domain-invariant transferability of adversarial examples via distance metric attack. Mathematics (2022). https://doi.org/10.3390/math10081249

    Article  Google Scholar 

  50. Ribeiro, M.T., Singh, S., Guestrin, C.: ‘Why should I trust you?’ Explaining the predictions of any classifier. In: NAACL-HLT 2016—2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Demonstrations Session (2016). https://doi.org/10.18653/v1/n16-3020

  51. Patwardhan, N., Marrone, S., Sansone, C.: Transformers in the real world: a survey on NLP applications. Information (Switzerland) (2023). https://doi.org/10.3390/info14040242

    Article  Google Scholar 

  52. He, L., Ai, Q., Yang, X., Ren, Y., Wang, Q., Xu, Z.: Boosting adversarial robustness via self-paced adversarial training. Neural Netw. (2023). https://doi.org/10.1016/j.neunet.2023.08.063

    Article  Google Scholar 

Download references

Acknowledgements

The present study was not given any explicit financial support from governmental, corporate, or philanthropic organizations.

Author information

Authors and Affiliations

Authors

Contributions

Ashish Bajaj: Software, Validation, Investigation, Data Curation, Writing – Original Draft, Visualization, Conceptualization, Methodology Dinesh Kumar Vishwakarma: Formal Analysis, Resources, Writing – Review & Editing, Supervision, Project Administration, Funding Acquisition.

Corresponding author

Correspondence to Dinesh Kumar Vishwakarma.

Ethics declarations

Conflict of interest

The contributors affirm that they do not own any identifiable conflicting financial interests or personal affiliations that may potentially be interpreted as exerting an impact on the research conducted in this investigation.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bajaj, A., Vishwakarma, D.K. Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms. Int. J. Inf. Secur. 23, 2711–2737 (2024). https://doi.org/10.1007/s10207-024-00861-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-024-00861-9

Keywords

Navigation