Arabic Offensive Language Classification on Twitter

Mubarak, Hamdy; Darwish, Kareem

doi:10.1007/978-3-030-34971-4_18

Arabic Offensive Language Classification on Twitter

Hamdy Mubarak¹⁵ &
Kareem Darwish¹⁵

Conference paper
First Online: 11 November 2019

1734 Accesses
13 Citations
4 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11864))

Abstract

Social media users often employ offensive language in their communication. Detecting offensive language on Twitter has many applications ranging from detecting/predicting conflict to measuring polarization. In this paper, we focus on building effective offensive tweet detection. We show that we can rapidly build a training set using a seed list of offensive words. Given the automatically created dataset, we trained a character n-gram based deep learning classifier that can effectively classify tweets with F1 score of 90%. We also show that we can expand our offensive word list by contrasting offensive and non-offensive tweets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://svmlight.joachims.org/.

References

Abozinadah, E.: Detecting abusive arabic language twitter accounts using a multidimensional analysis model. Ph.D. thesis, George Mason University (2017)
Google Scholar
Agrawal, S., Awekar, A.: Deep learning for detecting cyberbullying across multiple social media platforms. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 141–153. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_11
Chapter Google Scholar
Alakrot, A., Murray, L., Nikolov, N.S.: Towards accurate detection of offensive language in online communication in arabic. Procedia Comput. Sci. 142, 315–320 (2018)
Article Google Scholar
Albadi, N., Kurdi, M., Mishra, S.: Are they our brothers? analysis and detection of religious hate speech in the arabic twittersphere. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 69–76. IEEE (2018)
Google Scholar
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)
Google Scholar
Barberá, P., Sood, G.: Follow your ideology: measuring media ideology on social networks. In: Annual Meeting of the European Political Science Association, Vienna, Austria (2015). http://www.gsood.com/research/papers/mediabias.pdf
Chadefaux, T.: Early warning signals for war in the news. J. Peace Res. 51(1), 5–18 (2014)
Article Google Scholar
Conover, M., Ratkiewicz, J., Francisco, M.R., Gonçalves, B., Menczer, F., Flammini, A.: Political polarization on twitter. In: ICWSM, vol. 133, pp. 89–96 (2011)
Google Scholar
Darwish, K., Alexandrov, D., Nakov, P., Mejova, Y.: Seminar users in the arabic twitter sphere. In: Ciampaglia, G.L., Mashhadi, A., Yasseri, T. (eds.) SocInfo 2017. LNCS, vol. 10539, pp. 91–108. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67217-5_7
Chapter Google Scholar
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Eleventh International Conference on Web and Social Media (ICWSM), pp. 512–515 (2017)
Google Scholar
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th international conference on world wide web, pp. 29–30. ACM (2015)
Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3(Mar), 1289–1305 (2003)
MATH Google Scholar
Jay, T., Janschewitz, K.: The pragmatics of swearing. J. Politeness Res. Lang. Behav. Cult. 4(2), 267–288 (2008)
Google Scholar
Joachims, T.: A statistical learning model of text classification with support vector machines. In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 128–136 (2001)
Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: Twenty-seventh AAAI Conference on Artificial Intelligence (2013)
Google Scholar
Malmasi, S., Zampieri, M.: Detecting hate speech in social media. arXiv preprint arXiv:1712.06427 (2017)
Mubarak, H., Darwish, K.: Using twitter to collect a multi-dialectal corpus of arabic. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pp. 1–7 (2014)
Google Scholar
Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, pp. 52–56 (2017)
Google Scholar
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)
Google Scholar
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In: Proceedings of the NAACL student research workshop, pp. 88–93 (2016)
Google Scholar
Weber, I., Garimella, V.R.K., Batayneh, A.: Secular vs. islamist polarization in Egypt on twitter. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 290–297. ACM (2013)
Google Scholar
Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L.: Detection of harassment on web 2.0. Proc. Content Anal. WEB 2, 1–7 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Qatar Computing Research Institute, HBKU, Doha, Qatar
Hamdy Mubarak & Kareem Darwish

Authors

Hamdy Mubarak
View author publications
You can also search for this author in PubMed Google Scholar
Kareem Darwish
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kareem Darwish .

Editor information

Editors and Affiliations

Qatar Computing Research Institute, Doha, Qatar
Ingmar Weber
Qatar Computing Research Institute, Doha, Qatar
Kareem M. Darwish
University of Koblenz-Landau, Koblenz, Germany
Claudia Wagner
Max Planck Institute for Demographic Research, Rostock, Germany
Emilio Zagheni
Northeastern University, Boston, MA, USA
Laura Nelson
Max Planck Institute for Demographic Research, Rostock, Germany
Samin Aref
GESIS-Leibniz Institute for the Social Sciences, Cologne, Germany
Fabian Flöck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mubarak, H., Darwish, K. (2019). Arabic Offensive Language Classification on Twitter. In: Weber, I., et al. Social Informatics. SocInfo 2019. Lecture Notes in Computer Science(), vol 11864. Springer, Cham. https://doi.org/10.1007/978-3-030-34971-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-34971-4_18
Published: 11 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34970-7
Online ISBN: 978-3-030-34971-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics