Skip to main content

Arabic Offensive Language Classification on Twitter

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11864))

Abstract

Social media users often employ offensive language in their communication. Detecting offensive language on Twitter has many applications ranging from detecting/predicting conflict to measuring polarization. In this paper, we focus on building effective offensive tweet detection. We show that we can rapidly build a training set using a seed list of offensive words. Given the automatically created dataset, we trained a character n-gram based deep learning classifier that can effectively classify tweets with F1 score of 90%. We also show that we can expand our offensive word list by contrasting offensive and non-offensive tweets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://svmlight.joachims.org/.

References

  1. Abozinadah, E.: Detecting abusive arabic language twitter accounts using a multidimensional analysis model. Ph.D. thesis, George Mason University (2017)

    Google Scholar 

  2. Agrawal, S., Awekar, A.: Deep learning for detecting cyberbullying across multiple social media platforms. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 141–153. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_11

    Chapter  Google Scholar 

  3. Alakrot, A., Murray, L., Nikolov, N.S.: Towards accurate detection of offensive language in online communication in arabic. Procedia Comput. Sci. 142, 315–320 (2018)

    Article  Google Scholar 

  4. Albadi, N., Kurdi, M., Mishra, S.: Are they our brothers? analysis and detection of religious hate speech in the arabic twittersphere. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 69–76. IEEE (2018)

    Google Scholar 

  5. Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)

    Google Scholar 

  6. Barberá, P., Sood, G.: Follow your ideology: measuring media ideology on social networks. In: Annual Meeting of the European Political Science Association, Vienna, Austria (2015). http://www.gsood.com/research/papers/mediabias.pdf

  7. Chadefaux, T.: Early warning signals for war in the news. J. Peace Res. 51(1), 5–18 (2014)

    Article  Google Scholar 

  8. Conover, M., Ratkiewicz, J., Francisco, M.R., Gonçalves, B., Menczer, F., Flammini, A.: Political polarization on twitter. In: ICWSM, vol. 133, pp. 89–96 (2011)

    Google Scholar 

  9. Darwish, K., Alexandrov, D., Nakov, P., Mejova, Y.: Seminar users in the arabic twitter sphere. In: Ciampaglia, G.L., Mashhadi, A., Yasseri, T. (eds.) SocInfo 2017. LNCS, vol. 10539, pp. 91–108. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67217-5_7

    Chapter  Google Scholar 

  10. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Eleventh International Conference on Web and Social Media (ICWSM), pp. 512–515 (2017)

    Google Scholar 

  11. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th international conference on world wide web, pp. 29–30. ACM (2015)

    Google Scholar 

  12. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3(Mar), 1289–1305 (2003)

    MATH  Google Scholar 

  13. Jay, T., Janschewitz, K.: The pragmatics of swearing. J. Politeness Res. Lang. Behav. Cult. 4(2), 267–288 (2008)

    Google Scholar 

  14. Joachims, T.: A statistical learning model of text classification with support vector machines. In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 128–136 (2001)

    Google Scholar 

  15. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)

  16. Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: Twenty-seventh AAAI Conference on Artificial Intelligence (2013)

    Google Scholar 

  17. Malmasi, S., Zampieri, M.: Detecting hate speech in social media. arXiv preprint arXiv:1712.06427 (2017)

  18. Mubarak, H., Darwish, K.: Using twitter to collect a multi-dialectal corpus of arabic. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pp. 1–7 (2014)

    Google Scholar 

  19. Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, pp. 52–56 (2017)

    Google Scholar 

  20. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  21. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In: Proceedings of the NAACL student research workshop, pp. 88–93 (2016)

    Google Scholar 

  22. Weber, I., Garimella, V.R.K., Batayneh, A.: Secular vs. islamist polarization in Egypt on twitter. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 290–297. ACM (2013)

    Google Scholar 

  23. Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L.: Detection of harassment on web 2.0. Proc. Content Anal. WEB 2, 1–7 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kareem Darwish .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mubarak, H., Darwish, K. (2019). Arabic Offensive Language Classification on Twitter. In: Weber, I., et al. Social Informatics. SocInfo 2019. Lecture Notes in Computer Science(), vol 11864. Springer, Cham. https://doi.org/10.1007/978-3-030-34971-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34971-4_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34970-7

  • Online ISBN: 978-3-030-34971-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics