Social Media Hate Speech Detection Using Machine Learning Approach

Haider, Farhatul; Dipty, Ismotara; Rahman, Fiaj; Assaduzzaman, Md; Sohel, Amir

doi:10.1007/978-3-031-38296-3_17

Farhatul Haider¹⁹,
Ismotara Dipty¹⁹,
Fiaj Rahman¹⁹,
Md Assaduzzaman¹⁹ &
…
Amir Sohel¹⁹

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 673))

Included in the following conference series:

International Conference on Computational Intelligence in Data Science

125 Accesses
2 Citations

Abstract

Humanity has profited enormously from the interchange of information and the expanding use of social media but it has also raised a number of challenges, such as the persistence of hate speech. This growing problem on social media platforms, latterly studies used a different type of point engineering system and machine literacy algorithms to automatically descry hate comments on numerous data. As we know, several studies have been done so far and compared several point engineering strategies with machine literacy algorithms to discover which strategy is the most productive. This investigation aims to examine the performance of multiple engineering approaches with five machine literacy algorithms. The data sets contain the class orders hate speech, not hate speech and offensive comments independently. These social media posts are split into these two groups. To recognize the particular traits of hate speech text messages, the appropriate n-gram feature sets are extracted. The n-gram TF-IDF weights provide the foundation for these feature models. The main aspiration of this research work is to analyze, and resolve the above problem and compare algorithms and features used in machine learning to automatically detect hate speech and specified them like labeling into various classes like hate speech, offensive, and neither, etc. After using different classifiers, “Random Forest” has come up with better accuracy, precision, and recall compared to SVM (Support Vector Machine), Naive Bayes, Logistic Regression, Ada Boost, and Gradient boost algorithms. This system achieved an accuracy of 90.26% using a Random Forest. The experimental result showed that the “Random Forest” provided the best all-around accuracy from the model that has been made and it is more accurate than compare to other work done in recent times on this. So, the result obtain from the model, based on the resulting intensity of the comments can be extracted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chaffey, D.: Global social media statistics research summary 2022. Smart Insights (2022). https://www.smartinsights.com/social-media-marketing/social-media-strategy/new-global-social-media-research
Shepherd, J.: 22 essential Twitter statistics you need to know in 2022. The Social Shepherd (2022). https://thesocialshepherd.com/blog/twitter-statistics
Kovács, G., Alonso, P., Saini, R.: Challenges of hate speech detection in social media. SN Comput. Sci. 2(2) (2021). https://doi.org/10.1007/s42979-021-00457-3
Ahammed, S., Rahman, M., Niloy, M.H., Chowdhury, S.M.H.: Implementation of machine learning to detect hate speech in Bangla language. In: 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), pp. 317–320. IEEE (2019)
Google Scholar
Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 11 (2016)
Article Google Scholar
MacAvaney, S., Yao, H.R., Yang, E., Russell, K., Goharian, N., Frieder, O.: Hate speech detection: challenges and solutions. PLoS ONE 14(8), e0221152 (2019)
Google Scholar
Srinivasan, R., Subalalitha, C.N.: Sentimental analysis from imbalanced code-mixed data using machine learning approaches. Distrib. Parallel Databases 41, 1–16 (2021)
Google Scholar
Tulkens, S., et al.: A dictionary-based approach to racism detection in Dutch social media. ArXiv preprint arXiv: 1608.08738 (2016)
Upadhyay, I.S., Wadhawan, A., Mamidi, R.: HopefulMen@ LT-EDI-EACL2021: hope speech detection using Indic transliteration and transformers (2021). arXiv preprint arXiv:2102.12082
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceeding LSM 2012, Proceedings of the Second Workshop on language in Social Media, no. Lsm, pp. 19–26 (2012)
Google Scholar
Watanabe, H., Bouazizi, M., Ohtsuki, T.: Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6, 13825–13835 (2018). https://doi.org/10.1109/ACCESS.2018.2806394
Article Google Scholar
Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. In. J. Multimed. Ubiquit. Eng. 10(4), 215–230 (2015)
Article Google Scholar
Sharif, O., Hossain, E., Hoque, M.M.: NLP-CUET@DravidianLangTech-EACL2021: offensive language detection from multilingual code-mixed text using transformers. arXiv:2103.00455 [cs] (2021). Accessed 11 Feb 2023
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (2017)
Google Scholar
Jaki, S., De Smedt, T.: Right-wing German hate speech on twitter: analysis and automatic detection. arXiv preprint arXiv:1910.07518 (2019)
Malmasi, S., Zampieri, M.: Detecting Hate speech in social media. arXiv:1712.06427 [cs] (2017)
Zimbra, D., Abbasi, A., Zeng, D., Chen, H.: The state-of-the-art in twitter sentiment analysis. ACM Trans. Manage. Inf. Syst. 9(2), 1–29 (2018). https://doi.org/10.1145/3185045
Article Google Scholar
Hate Speech and Offensive Language Dataset. http://www.kaggle.com, http://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset
Support Vector Machines. Scikit-learn. http://scikitlearn.org/stable/modules/svm.html
Logistic regression. Wikipedia (2023). http://en.m.wikipedia.org/wiki/. Logisticregression. Accessed 11 Feb 2023
Machine Learning Random Forest Algorithm - Javatpoint. http://www.javatpoint.com, http://www.javatpoint.com/machine-learning-random-forest-algorithm
ML - Gradient Boosting. GeeksforGeeks (2020). http://www.geeksforgeeks.org/ml-gradient-boosting/
Saini, A.: AdaBoost algorithm - a complete guide for beginners. Analytics Vidhya (2021). http://www.analyticsvidhya.com/blog/2021/09/adaboost-algorithm-a-complete-guide-for-beginners/
Confusion Matrix - an overview \(|\) ScienceDirect Topics. http://www.sciencedirect.com, http://www.sciencedirect.com/topics/engineering/confusion-matrix

Download references

Author information

Authors and Affiliations

Department of CSE, Daffodil International University, Dhaka, Bangladesh
Farhatul Haider, Ismotara Dipty, Fiaj Rahman, Md Assaduzzaman & Amir Sohel

Authors

Farhatul Haider
View author publications
You can also search for this author in PubMed Google Scholar
Ismotara Dipty
View author publications
You can also search for this author in PubMed Google Scholar
Fiaj Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Md Assaduzzaman
View author publications
You can also search for this author in PubMed Google Scholar
Amir Sohel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md Assaduzzaman .

Editor information

Editors and Affiliations

Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Sarath Chandran K R
Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Sujaudeen N
Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Beulah A
Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Shahul Hamead H

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haider, F., Dipty, I., Rahman, F., Assaduzzaman, M., Sohel, A. (2023). Social Media Hate Speech Detection Using Machine Learning Approach. In: Chandran K R, S., N, S., A, B., Hamead H, S. (eds) Computational Intelligence in Data Science. ICCIDS 2023. IFIP Advances in Information and Communication Technology, vol 673. Springer, Cham. https://doi.org/10.1007/978-3-031-38296-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-38296-3_17
Published: 22 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38295-6
Online ISBN: 978-3-031-38296-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)