Skip to main content

Threatening Expression and Target Identification in Under-Resource Languages Using NLP Techniques

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14486))

  • 65 Accesses

Abstract

In recent decades, hate speech on social media platforms has been on the rise. It is highly desired to control this kind of material because it initiates unrest and harms to the society. Literature describes several forms of the hate speech and it is quite challenging to differentiate between these forms and to design an automated detection system, especially for under-resource languages. In this study, we propose a robust framework for threatening expressions and its target identification in Urdu (Nastaliq style) language. The proposed methodology presents each step in detail like data collection & annotation, cleaning & pre-processing step, and fine-tuning of Robustly Optimized Bidirectional Encoder Representations from Transformer (Urdu-RoBERTa) with grid search technique for hyper-parameters optimization. The study exploits the strength of a pre-trained Urdu-RoBERTa as a transfer learning technique with grid search fine-tuning. The proposed framework is compared with state-of-the art baseline and ten comparable models and it outperformed all for both tasks (threatening expression and target identification). Furthermore, the proposed framework obtained benchmark performance and improved the f1-score with substantial margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chhabra, A., Vishwakarma, D.K.: A literature survey on multimodal and multilingual automatic hate speech identification. Multimed. Syst. 1–28 (2023)

    Google Scholar 

  2. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (2017)

    Google Scholar 

  3. Delgado, R., Stefancic, J.: Images of the outsider in American law and culture: can free expression remedy systemic social ills. Cornell L. Rev. 77, 1258 (1991)

    Google Scholar 

  4. Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. (CSUR) 51(4), 1–30 (2018)

    Article  Google Scholar 

  5. Youtube. YouTube hate policy. https://support.google.com/youtube/answer/2801939?hl=en.2019

  6. Twitter. Twitter_Hate Definition. https://support.twitter.com/articles/.2017

  7. De Gibert, O., et al.: Hate speech dataset from a white supremacy forum. arXiv preprint arXiv:1809.04444 (2018)

  8. Andročec, D.: Machine learning methods for toxic comment classification: a systematic review. Acta Universitatis Sapientiae, Informatica 12(2), 205–216 (2020)

    Article  Google Scholar 

  9. Malmasi, S., Zampieri, M.: Challenges in discriminating profanity from hate speech. J. Exp. Theor. Artif. Intell. 30(2), 187–202 (2018)

    Article  Google Scholar 

  10. Thompson, N.: Social Problems and Social Justice. Bloomsbury Publishing (2017)

    Google Scholar 

  11. Chen, Y., et al.: Detecting offensive language in social media to protect adolescent online safety. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing. IEEE (2012)

    Google Scholar 

  12. Ashraf, N., et al.: Individual vs. group violent threats classification in online discussions. In: Companion Proceedings of the Web Conference 2020 (2020)

    Google Scholar 

  13. Jiang, L., et al.: Intelligent control of building fire protection system using digital twins and semantic web technologies. Autom. Constr. 147, 104728 (2023)

    Article  Google Scholar 

  14. Mazari, A.C., Boudoukhani, N., Djeffal, A.: BERT-based ensemble learning for multi-aspect hate speech detection. Cluster Comput. 1–15 (2023)

    Google Scholar 

  15. Nawaz, A., et al.: Extractive text summarization models for Urdu language. Inf. Process. Manag. 57(6), 102383 (2020)

    Article  Google Scholar 

  16. Amjad, M., et al.: Threatening language detection and target identification in Urdu tweets. IEEE Access 9, 128302–128313 (2021)

    Article  Google Scholar 

  17. Kalraa, S., Agrawala, M., Sharmaa, Y.: Detection of Threat Records by Analyzing the Tweets in Urdu Language Exploring Deep Learning Transformer-Based Models (2021)

    Google Scholar 

  18. Das, M., Banerjee, S., Saha, P.: Abusive and threatening language detection in Urdu using boosting based and BERT based models: a comparative approach. arXiv preprint arXiv:2111.14830 (2021)

  19. Humayoun, M.: Abusive and threatening language detection in Urdu using supervised machine learning and feature combinations. arXiv preprint arXiv:2204.03062 (2022)

  20. Mehmood, A., et al.: Threatening URDU language detection from tweets using machine learning. Appl. Sci. 12(20), 10342 (2022)

    Article  Google Scholar 

  21. Hussain, S., Malik, M.S.I., Masood, N.: Identification of offensive language in Urdu using semantic and embedding models. PeerJ Computer Science 8, e1169 (2022)

    Article  Google Scholar 

  22. Amjad, M., et al.: Automatic abusive language detection in Urdu tweets. Acta Polytechnica Hungarica 1785–8860 (2021)

    Google Scholar 

  23. Saeed, R., et al.: Detection of offensive language and its severity for low resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22, 1–27 (2023)

    Article  Google Scholar 

  24. Malik, M.S.I., Cheema, U., Ignatov, D.I.: Contextual embeddings based on fine-tuned Urdu-BERT for Urdu threatening content and target identification. J. King Saud Univ.-Comput. Inf. Sci. 101606 (2023)

    Google Scholar 

  25. Malik, M.S.I., et al.: Multilingual hope speech detection: a robust framework using transfer learning of fine-tuning RoBERTa model. J. King Saud Univ.-Comput. Inf. Sci. 35(8), 101736 (2023)

    Google Scholar 

  26. Rehan, M., Malik, M.S.I., Jamjoom, M.M.: Fine-tuning transformer models using transfer learning for multilingual threatening text identification. IEEE Access (2023)

    Google Scholar 

  27. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  28. Younas, M.Z., Malik, M.S.I., Ignatov, D.I.: Automated defect identification for cell phones using language context, linguistic and smoke-word models. Expert Syst. Appl. 227, 120236 (2023)

    Article  Google Scholar 

  29. Malik, M.S.I., Imran, T., Mamdouh, J.M.: How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models. PeerJ Comput. Sci. 9, e1248 (2023)

    Article  Google Scholar 

Download references

Acknowledgments

This article is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University). Moreover, this research was supported in part by computational resources of HPC facilities at HSE University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Shahid Iqbal Malik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Malik, M.S.I. (2024). Threatening Expression and Target Identification in Under-Resource Languages Using NLP Techniques. In: Ignatov, D.I., et al. Analysis of Images, Social Networks and Texts. AIST 2023. Lecture Notes in Computer Science, vol 14486. Springer, Cham. https://doi.org/10.1007/978-3-031-54534-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-54534-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-54533-7

  • Online ISBN: 978-3-031-54534-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics