A transfer learning approach for detecting offensive and hate speech on social media platforms

Priyadarshini, Ishaani; Sahu, Sandipan; Kumar, Raghvendra

doi:10.1007/s11042-023-14481-3

A transfer learning approach for detecting offensive and hate speech on social media platforms

Published: 15 February 2023

Volume 82, pages 27473–27499, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ishaani Priyadarshini¹,
Sandipan Sahu² &
Raghvendra Kumar³

631 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Over the last few decades, the expansion of technology and the internet has led to the number of users proliferating on social media, with a simultaneous increase in hate speech. A critical concern is, hate speech is not only responsible for igniting violence and spreading hatred, but its detection also requires a considerable amount of computing resources and content monitoring by human experts and algorithms. While the research is an active area, and several artificial intelligence techniques have been proposed in the past to address the concern, the rise in the number of petabytes of the content generated calls for methods that will exhibit improved performance and reduced model development time. We propose a transfer learning approach for detecting hate and offensive speech on social media that deploys a pre-trained model for data analysis thereby promoting model reusability. We propose two transfer learning models, i.e. Google’s Word2vec model using LSTM and GloVe Model using LSTM for the same and compare the performance of our proposed model against unigram and bigram language models for Naive Bayes (NB), Decision Trees (DT), and Support Vector Machines (SVM), which are also the baseline algorithms considered for analysis. The performance of the proposed models for classifying hate speech, offensive speech, and neutral speech is validated using metrics such as precision, recall, F-1 score, and support. The overall performance of the models across multiple datasets has been evaluated with respect to accuracy. In-depth experimental analysis and results depict that the proposed model is significantly robust for detecting hateful and offensive speech and also performs better than the considered baseline algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fake news, disinformation and misinformation in social media: a review

Article 09 February 2023

Fake news detection based on news content and social contexts: a transformer-based approach

Article 30 January 2022

GenAI against humanity: nefarious applications of generative artificial intelligence and large language models

Article Open access 22 February 2024

Data availability

Not Applicable.

References

Al-Hassan A, Al-Dossari H (2021) Detection of hate speech in Arabic tweets using deep learning. Multimedia Systems:1–12
Al-Makhadmeh Z, Tolba A (2020) Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach. Computing 102(2):501–522
Article Google Scholar
Aulia N, Budi I (2019) Hate speech detection on Indonesian long text documents using machine learning approach. In: Proceedings of the 2019 5th international conference on computing and artificial intelligence, pp 164–169
Chapter Google Scholar
Ayo FE, Folorunso O, Ibharalu FT, Osinuga IA (2020) Machine learning techniques for hate speech classification of twitter data: state-of-the-art, future challenges and research directions. Comput Sci Rev 38:100311
Article Google Scholar
Aziz NAA, Maarof MA, Zainal A (2021) Hate speech and offensive language detection: a new feature set with filter-embedded combining feature selection. In: 2021 3rd international cyber resilience conference (CRC). IEEE, pp 1–6
Google Scholar
Briliani A, Irawan B, Setianingsih C (2019) Hate speech detection in Indonesian language on Instagram comment section using K-nearest neighbor classification method. In: 2019 IEEE international conference on internet of things and intelligence system (IoTaIS). IEEE, pp 98–104
Chapter Google Scholar
Dataset 1, Hate Speech and Offensive Language Dataset, https://www.kaggle.com/mrmorj/hate-speech-and-offensive-language-dataset
Dataset 2, Hate-Offensive Speech Detection, https://www.kaggle.com/alternacx/hateoffensive-speech-detection
Gambäck B, Sikdar UK (2017) Using convolutional neural networks to classify hate-speech. In: Proceedings of the first workshop on abusive language online, pp 85–90
Chapter Google Scholar
García-Díaz JA, Jiménez-Zafra SM, García-Cumbreras MA, Valencia-García R (2022) Evaluating feature combination strategies for hate-speech detection in spanish using linguistic features and transformers. Complex Intell Syst:1–22
Gencoglu O (2020) Cyberbullying detection with fairness constraints. IEEE Internet Comput
Jha S, Kumar R, Abdel-Basset M, Priyadarshini I, Sharma R, Long HV (2019) Deep learning approach for software maintainability metrics prediction. Ieee Access 7:61840–61855
Article Google Scholar
Khan H, Yu F, Sinha A, Gokhale SS (2021) A parsimonious and practical approach to detecting offensive speech. In: 2021 international conference on computing, communication, and intelligent systems (ICCCIS). IEEE, pp 688–695
Chapter Google Scholar
Kumar D, Kumar N, Mishra S (2021) QUARC: quaternion multi-modal fusion architecture for hate speech classification. In: 2021 IEEE international conference on big data and smart computing (BigComp). IEEE, pp 346–349
Chapter Google Scholar
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning. PMLR, pp 1188–1196
Google Scholar
Matamoros-Fernández A, Farkas J (2021) Racism, hate speech, and social media: a systematic review and critique. Telev New Media 22(2):205–224
Article Google Scholar
Miok K, Škrlj B, Zaharie D, Robnik-Šikonja M (2021) To ban or not to ban: Bayesian attention networks for reliable hate speech detection. Cogn Comput:1–19
Mishra S, Prasad S, Mishra S (2021) Exploring multi-task multi-lingual learning of transformer models for hate speech and offensive speech identification in social media. SN Comput Sci 2(2):1–19
Article Google Scholar
Muhammad IZ, Nasrun M, Setianingsih C (2020) Hate speech detection using global vector and deep belief network algorithm. In: 2020 1st international conference on big data analytics and practices (IBDAP). IEEE, pp 1–6
Google Scholar
Oriola O, Kotzé E (2020) Evaluating machine learning techniques for detecting offensive and hate speech in south African tweets. IEEE Access 8:21496–21509
Article Google Scholar
Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation
Google Scholar
Pitsilis GK, Ramampiaro H, Langseth H (2018) Effective hate-speech detection in twitter data using recurrent neural networks. Appl Intell 48(12):4730–4742
Article Google Scholar
Pritam N, Khari M, Kumar R, Jha S, Priyadarshini I, Abdel-Basset M, Long HV (2019) Assessment of code smell for predicting class change proneness using machine learning. IEEE Access 7:37414–37425
Article Google Scholar
Priyadarshini I, Cotton C (2019) Internet memes: a novel approach to distinguish humans and bots for authentication. In: Proceedings of the future technologies conference. Springer, Cham, pp 204–222
Google Scholar
Priyadarshini I, Cotton C (2020) Intelligence in cyberspace: the road to cyber singularity. J Exp Theor Artif Intell:1–35
Priyadarshini I, Cotton C (2021) A novel LSTM–CNN–grid search-based deep neural network for sentiment analysis. J Supercomput:1–22
Priyadarshini I, Puri V (2021a) A convolutional neural network (CNN) based ensemble model for exoplanet detection. Earth Sci Inf 14(2):735–747
Article Google Scholar
Priyadarshini I, Puri V (2021b) Mars weather data analysis using machine learning techniques. Earth science informatics. Springer
Google Scholar
Priyadarshini I, Wang H, Cotton C (2019) Some cyberpsychology techniques to distinguish humans and bots for authentication. In: Proceedings of the future technologies conference. Springer, Cham, pp 306–323
Google Scholar
Priyadarshini I, Kumar R, Sharma R, Singh PK, Satapathy SC (2021a) Identifying cyber insecurities in trustworthy space and energy sector for smart grids. Comput Electr Eng 93:107204
Article Google Scholar
Priyadarshini I, Mohanty P, Kumar R, Sharma R, Puri V, Singh PK (2021b) A study on the sentiments and psychology of twitter users during COVID-19 lockdown period. Multimed Tools Appl:1–23
Pronoza E, Panicheva P, Koltsova O, Rosso P (2021) Detecting ethnicity-targeted hate speech in Russian social media texts. Inf Process Manag 58(6):102674
Article Google Scholar
Roy PK, Bhawal S, Subalalitha CN (2022) Hate speech and offensive language detection in Dravidian languages using deep ensemble framework. Comput Speech Lang 75:101386
Article Google Scholar
Saeed F, Al-Sarem M, Alromema W (2021) Tuning hyper-parameters of machine learning methods for improving the detection of hate speech. In: Advances on smart and soft computing. Springer, Singapore, pp 71–78
Chapter Google Scholar
Setyadi NA, Nasrun M, Setianingsih C (2018) Text analysis for hate speech detection using backpropagation neural network. In: 2018 international conference on control, electronics, renewable energy and communications (ICCEREC). IEEE, pp 159–165
Chapter Google Scholar
Sohn H, Lee H (2019) Mc-bert4hate: hate speech detection using multi-channel bert for different languages and translations. In: 2019 international conference on data mining workshops (ICDMW). IEEE, pp 551–559
Chapter Google Scholar
Tuan TA, Long HV, Kumar R, Priyadarshini I, Son NTK (2019) Performance evaluation of botnet DDoS attack detection using machine learning. Evol Intel:1–12
Vashistha N, Zubiaga A (2021) Online multilingual hate speech detection: experimenting with Hindi and English social media. Information 12(1):5
Article Google Scholar
Vo T, Sharma R, Kumar R, Son LH, Pham BT, Tien Bui D … Le T (2020) Crime rate detection using social media of different crime locations and Twitter part-of-speech tagger with Brown clustering. J Intell Fuzzy Syst (Preprint):1–13
Waseem Z, Hovy D (2016) Hateful symbols or hateful people? Predictive features for hate speech detection on twitter. In: Proceedings of the NAACL student research workshop, pp 88–93
Google Scholar
Wullach T, Adler A, Minkov EM (2020) Towards hate speech detection at large via deep generative modeling. IEEE Internet Comput

Download references

Author information

Authors and Affiliations

School of Information, University of California, Berkeley, CA, USA
Ishaani Priyadarshini
Department of Computer Science and Engineering, Bengal Institute of Technology, Kolkata, India
Sandipan Sahu
Department of Computer Science and Engineering, GIET University, Gunupur, India
Raghvendra Kumar

Authors

Ishaani Priyadarshini
View author publications
You can also search for this author in PubMed Google Scholar
Sandipan Sahu
View author publications
You can also search for this author in PubMed Google Scholar
Raghvendra Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raghvendra Kumar.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Priyadarshini, I., Sahu, S. & Kumar, R. A transfer learning approach for detecting offensive and hate speech on social media platforms. Multimed Tools Appl 82, 27473–27499 (2023). https://doi.org/10.1007/s11042-023-14481-3

Download citation

Received: 24 June 2021
Revised: 09 June 2022
Accepted: 31 January 2023
Published: 15 February 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11042-023-14481-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A transfer learning approach for detecting offensive and hate speech on social media platforms

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Fake news detection based on news content and social contexts: a transformer-based approach

GenAI against humanity: nefarious applications of generative artificial intelligence and large language models

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A transfer learning approach for detecting offensive and hate speech on social media platforms

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Fake news detection based on news content and social contexts: a transformer-based approach

GenAI against humanity: nefarious applications of generative artificial intelligence and large language models

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation