Skip to main content

Social Media Hate Speech Detection Using Machine Learning Approach

  • Conference paper
  • First Online:
Computational Intelligence in Data Science (ICCIDS 2023)

Abstract

Humanity has profited enormously from the interchange of information and the expanding use of social media but it has also raised a number of challenges, such as the persistence of hate speech. This growing problem on social media platforms, latterly studies used a different type of point engineering system and machine literacy algorithms to automatically descry hate comments on numerous data. As we know, several studies have been done so far and compared several point engineering strategies with machine literacy algorithms to discover which strategy is the most productive. This investigation aims to examine the performance of multiple engineering approaches with five machine literacy algorithms. The data sets contain the class orders hate speech, not hate speech and offensive comments independently. These social media posts are split into these two groups. To recognize the particular traits of hate speech text messages, the appropriate n-gram feature sets are extracted. The n-gram TF-IDF weights provide the foundation for these feature models. The main aspiration of this research work is to analyze, and resolve the above problem and compare algorithms and features used in machine learning to automatically detect hate speech and specified them like labeling into various classes like hate speech, offensive, and neither, etc. After using different classifiers, “Random Forest” has come up with better accuracy, precision, and recall compared to SVM (Support Vector Machine), Naive Bayes, Logistic Regression, Ada Boost, and Gradient boost algorithms. This system achieved an accuracy of 90.26% using a Random Forest. The experimental result showed that the “Random Forest” provided the best all-around accuracy from the model that has been made and it is more accurate than compare to other work done in recent times on this. So, the result obtain from the model, based on the resulting intensity of the comments can be extracted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chaffey, D.: Global social media statistics research summary 2022. Smart Insights (2022). https://www.smartinsights.com/social-media-marketing/social-media-strategy/new-global-social-media-research

  2. Shepherd, J.: 22 essential Twitter statistics you need to know in 2022. The Social Shepherd (2022). https://thesocialshepherd.com/blog/twitter-statistics

  3. Kovács, G., Alonso, P., Saini, R.: Challenges of hate speech detection in social media. SN Comput. Sci. 2(2) (2021). https://doi.org/10.1007/s42979-021-00457-3

  4. Ahammed, S., Rahman, M., Niloy, M.H., Chowdhury, S.M.H.: Implementation of machine learning to detect hate speech in Bangla language. In: 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), pp. 317–320. IEEE (2019)

    Google Scholar 

  5. Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 11 (2016)

    Article  Google Scholar 

  6. MacAvaney, S., Yao, H.R., Yang, E., Russell, K., Goharian, N., Frieder, O.: Hate speech detection: challenges and solutions. PLoS ONE 14(8), e0221152 (2019)

    Google Scholar 

  7. Srinivasan, R., Subalalitha, C.N.: Sentimental analysis from imbalanced code-mixed data using machine learning approaches. Distrib. Parallel Databases 41, 1–16 (2021)

    Google Scholar 

  8. Tulkens, S., et al.: A dictionary-based approach to racism detection in Dutch social media. ArXiv preprint arXiv: 1608.08738 (2016)

  9. Upadhyay, I.S., Wadhawan, A., Mamidi, R.: HopefulMen@ LT-EDI-EACL2021: hope speech detection using Indic transliteration and transformers (2021). arXiv preprint arXiv:2102.12082

  10. Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceeding LSM 2012, Proceedings of the Second Workshop on language in Social Media, no. Lsm, pp. 19–26 (2012)

    Google Scholar 

  11. Watanabe, H., Bouazizi, M., Ohtsuki, T.: Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6, 13825–13835 (2018). https://doi.org/10.1109/ACCESS.2018.2806394

    Article  Google Scholar 

  12. Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. In. J. Multimed. Ubiquit. Eng. 10(4), 215–230 (2015)

    Article  Google Scholar 

  13. Sharif, O., Hossain, E., Hoque, M.M.: NLP-CUET@DravidianLangTech-EACL2021: offensive language detection from multilingual code-mixed text using transformers. arXiv:2103.00455 [cs] (2021). Accessed 11 Feb 2023

  14. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (2017)

    Google Scholar 

  15. Jaki, S., De Smedt, T.: Right-wing German hate speech on twitter: analysis and automatic detection. arXiv preprint arXiv:1910.07518 (2019)

  16. Malmasi, S., Zampieri, M.: Detecting Hate speech in social media. arXiv:1712.06427 [cs] (2017)

  17. Zimbra, D., Abbasi, A., Zeng, D., Chen, H.: The state-of-the-art in twitter sentiment analysis. ACM Trans. Manage. Inf. Syst. 9(2), 1–29 (2018). https://doi.org/10.1145/3185045

    Article  Google Scholar 

  18. Hate Speech and Offensive Language Dataset. http://www.kaggle.com, http://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset

  19. Support Vector Machines. Scikit-learn. http://scikitlearn.org/stable/modules/svm.html

  20. Logistic regression. Wikipedia (2023). http://en.m.wikipedia.org/wiki/. Logisticregression. Accessed 11 Feb 2023

  21. Machine Learning Random Forest Algorithm - Javatpoint. http://www.javatpoint.com, http://www.javatpoint.com/machine-learning-random-forest-algorithm

  22. ML - Gradient Boosting. GeeksforGeeks (2020). http://www.geeksforgeeks.org/ml-gradient-boosting/

  23. Saini, A.: AdaBoost algorithm - a complete guide for beginners. Analytics Vidhya (2021). http://www.analyticsvidhya.com/blog/2021/09/adaboost-algorithm-a-complete-guide-for-beginners/

  24. Confusion Matrix - an overview \(|\) ScienceDirect Topics. http://www.sciencedirect.com, http://www.sciencedirect.com/topics/engineering/confusion-matrix

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md Assaduzzaman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Haider, F., Dipty, I., Rahman, F., Assaduzzaman, M., Sohel, A. (2023). Social Media Hate Speech Detection Using Machine Learning Approach. In: Chandran K R, S., N, S., A, B., Hamead H, S. (eds) Computational Intelligence in Data Science. ICCIDS 2023. IFIP Advances in Information and Communication Technology, vol 673. Springer, Cham. https://doi.org/10.1007/978-3-031-38296-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-38296-3_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-38295-6

  • Online ISBN: 978-3-031-38296-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics