Threatening Expression and Target Identification in Under-Resource Languages Using NLP Techniques

Malik, Muhammad Shahid Iqbal

doi:10.1007/978-3-031-54534-4_1

Muhammad Shahid Iqbal Malik²⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14486))

Included in the following conference series:

International Conference on Analysis of Images, Social Networks and Texts

65 Accesses

Abstract

In recent decades, hate speech on social media platforms has been on the rise. It is highly desired to control this kind of material because it initiates unrest and harms to the society. Literature describes several forms of the hate speech and it is quite challenging to differentiate between these forms and to design an automated detection system, especially for under-resource languages. In this study, we propose a robust framework for threatening expressions and its target identification in Urdu (Nastaliq style) language. The proposed methodology presents each step in detail like data collection & annotation, cleaning & pre-processing step, and fine-tuning of Robustly Optimized Bidirectional Encoder Representations from Transformer (Urdu-RoBERTa) with grid search technique for hyper-parameters optimization. The study exploits the strength of a pre-trained Urdu-RoBERTa as a transfer learning technique with grid search fine-tuning. The proposed framework is compared with state-of-the art baseline and ten comparable models and it outperformed all for both tasks (threatening expression and target identification). Furthermore, the proposed framework obtained benchmark performance and improved the f1-score with substantial margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chhabra, A., Vishwakarma, D.K.: A literature survey on multimodal and multilingual automatic hate speech identification. Multimed. Syst. 1–28 (2023)
Google Scholar
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (2017)
Google Scholar
Delgado, R., Stefancic, J.: Images of the outsider in American law and culture: can free expression remedy systemic social ills. Cornell L. Rev. 77, 1258 (1991)
Google Scholar
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. (CSUR) 51(4), 1–30 (2018)
Article Google Scholar
Youtube. YouTube hate policy. https://support.google.com/youtube/answer/2801939?hl=en.2019
Twitter. Twitter_Hate Definition. https://support.twitter.com/articles/.2017
De Gibert, O., et al.: Hate speech dataset from a white supremacy forum. arXiv preprint arXiv:1809.04444 (2018)
Andročec, D.: Machine learning methods for toxic comment classification: a systematic review. Acta Universitatis Sapientiae, Informatica 12(2), 205–216 (2020)
Article Google Scholar
Malmasi, S., Zampieri, M.: Challenges in discriminating profanity from hate speech. J. Exp. Theor. Artif. Intell. 30(2), 187–202 (2018)
Article Google Scholar
Thompson, N.: Social Problems and Social Justice. Bloomsbury Publishing (2017)
Google Scholar
Chen, Y., et al.: Detecting offensive language in social media to protect adolescent online safety. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing. IEEE (2012)
Google Scholar
Ashraf, N., et al.: Individual vs. group violent threats classification in online discussions. In: Companion Proceedings of the Web Conference 2020 (2020)
Google Scholar
Jiang, L., et al.: Intelligent control of building fire protection system using digital twins and semantic web technologies. Autom. Constr. 147, 104728 (2023)
Article Google Scholar
Mazari, A.C., Boudoukhani, N., Djeffal, A.: BERT-based ensemble learning for multi-aspect hate speech detection. Cluster Comput. 1–15 (2023)
Google Scholar
Nawaz, A., et al.: Extractive text summarization models for Urdu language. Inf. Process. Manag. 57(6), 102383 (2020)
Article Google Scholar
Amjad, M., et al.: Threatening language detection and target identification in Urdu tweets. IEEE Access 9, 128302–128313 (2021)
Article Google Scholar
Kalraa, S., Agrawala, M., Sharmaa, Y.: Detection of Threat Records by Analyzing the Tweets in Urdu Language Exploring Deep Learning Transformer-Based Models (2021)
Google Scholar
Das, M., Banerjee, S., Saha, P.: Abusive and threatening language detection in Urdu using boosting based and BERT based models: a comparative approach. arXiv preprint arXiv:2111.14830 (2021)
Humayoun, M.: Abusive and threatening language detection in Urdu using supervised machine learning and feature combinations. arXiv preprint arXiv:2204.03062 (2022)
Mehmood, A., et al.: Threatening URDU language detection from tweets using machine learning. Appl. Sci. 12(20), 10342 (2022)
Article Google Scholar
Hussain, S., Malik, M.S.I., Masood, N.: Identification of offensive language in Urdu using semantic and embedding models. PeerJ Computer Science 8, e1169 (2022)
Article Google Scholar
Amjad, M., et al.: Automatic abusive language detection in Urdu tweets. Acta Polytechnica Hungarica 1785–8860 (2021)
Google Scholar
Saeed, R., et al.: Detection of offensive language and its severity for low resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22, 1–27 (2023)
Article Google Scholar
Malik, M.S.I., Cheema, U., Ignatov, D.I.: Contextual embeddings based on fine-tuned Urdu-BERT for Urdu threatening content and target identification. J. King Saud Univ.-Comput. Inf. Sci. 101606 (2023)
Google Scholar
Malik, M.S.I., et al.: Multilingual hope speech detection: a robust framework using transfer learning of fine-tuning RoBERTa model. J. King Saud Univ.-Comput. Inf. Sci. 35(8), 101736 (2023)
Google Scholar
Rehan, M., Malik, M.S.I., Jamjoom, M.M.: Fine-tuning transformer models using transfer learning for multilingual threatening text identification. IEEE Access (2023)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Younas, M.Z., Malik, M.S.I., Ignatov, D.I.: Automated defect identification for cell phones using language context, linguistic and smoke-word models. Expert Syst. Appl. 227, 120236 (2023)
Article Google Scholar
Malik, M.S.I., Imran, T., Mamdouh, J.M.: How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models. PeerJ Comput. Sci. 9, e1248 (2023)
Article Google Scholar

Download references

Acknowledgments

This article is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University). Moreover, this research was supported in part by computational resources of HPC facilities at HSE University.

Author information

Authors and Affiliations

Department of Computer Science, National Research University Higher School of Economics, 11 Pokrovskiy Boulevard, Moscow, 109028, Russian Federation
Muhammad Shahid Iqbal Malik

Authors

Muhammad Shahid Iqbal Malik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Shahid Iqbal Malik .

Editor information

Editors and Affiliations

National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
Krasovskii Institute of Mathematics and Mechanics of Russian Academy of Sciences, Yekaterinburg, Russia
Michael Khachay
University of Oslo, Oslo, Norway
Andrey Kutuzov
American University of Armenia, Yerevan, Armenia
Habet Madoyan
Artificial Intelligence Research Institute, Moscow, Russia
Ilya Makarov
University of Hamburg, Hamburg, Germany
Irina Nikishina
Skolkovo Institute of Science and Technology, Moscow, Russia
Alexander Panchenko
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Maxim Panov
University of Florida, Gainesville, FL, USA
Panos M. Pardalos
National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Apptek, Aachen, Germany
Evgenii Tsymbalov
Kazan Federal University, Kazan, Russia
Elena Tutubalina
MTS AI, Moscow, Russia
Sergey Zagoruyko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malik, M.S.I. (2024). Threatening Expression and Target Identification in Under-Resource Languages Using NLP Techniques. In: Ignatov, D.I., et al. Analysis of Images, Social Networks and Texts. AIST 2023. Lecture Notes in Computer Science, vol 14486. Springer, Cham. https://doi.org/10.1007/978-3-031-54534-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-54534-4_1
Published: 12 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54533-7
Online ISBN: 978-3-031-54534-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics