Abstract
Social media users often employ offensive language in their communication. Detecting offensive language on Twitter has many applications ranging from detecting/predicting conflict to measuring polarization. In this paper, we focus on building effective offensive tweet detection. We show that we can rapidly build a training set using a seed list of offensive words. Given the automatically created dataset, we trained a character n-gram based deep learning classifier that can effectively classify tweets with F1 score of 90%. We also show that we can expand our offensive word list by contrasting offensive and non-offensive tweets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Abozinadah, E.: Detecting abusive arabic language twitter accounts using a multidimensional analysis model. Ph.D. thesis, George Mason University (2017)
Agrawal, S., Awekar, A.: Deep learning for detecting cyberbullying across multiple social media platforms. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 141–153. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_11
Alakrot, A., Murray, L., Nikolov, N.S.: Towards accurate detection of offensive language in online communication in arabic. Procedia Comput. Sci. 142, 315–320 (2018)
Albadi, N., Kurdi, M., Mishra, S.: Are they our brothers? analysis and detection of religious hate speech in the arabic twittersphere. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 69–76. IEEE (2018)
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)
Barberá, P., Sood, G.: Follow your ideology: measuring media ideology on social networks. In: Annual Meeting of the European Political Science Association, Vienna, Austria (2015). http://www.gsood.com/research/papers/mediabias.pdf
Chadefaux, T.: Early warning signals for war in the news. J. Peace Res. 51(1), 5–18 (2014)
Conover, M., Ratkiewicz, J., Francisco, M.R., Gonçalves, B., Menczer, F., Flammini, A.: Political polarization on twitter. In: ICWSM, vol. 133, pp. 89–96 (2011)
Darwish, K., Alexandrov, D., Nakov, P., Mejova, Y.: Seminar users in the arabic twitter sphere. In: Ciampaglia, G.L., Mashhadi, A., Yasseri, T. (eds.) SocInfo 2017. LNCS, vol. 10539, pp. 91–108. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67217-5_7
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Eleventh International Conference on Web and Social Media (ICWSM), pp. 512–515 (2017)
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th international conference on world wide web, pp. 29–30. ACM (2015)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3(Mar), 1289–1305 (2003)
Jay, T., Janschewitz, K.: The pragmatics of swearing. J. Politeness Res. Lang. Behav. Cult. 4(2), 267–288 (2008)
Joachims, T.: A statistical learning model of text classification with support vector machines. In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 128–136 (2001)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: Twenty-seventh AAAI Conference on Artificial Intelligence (2013)
Malmasi, S., Zampieri, M.: Detecting hate speech in social media. arXiv preprint arXiv:1712.06427 (2017)
Mubarak, H., Darwish, K.: Using twitter to collect a multi-dialectal corpus of arabic. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pp. 1–7 (2014)
Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, pp. 52–56 (2017)
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In: Proceedings of the NAACL student research workshop, pp. 88–93 (2016)
Weber, I., Garimella, V.R.K., Batayneh, A.: Secular vs. islamist polarization in Egypt on twitter. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 290–297. ACM (2013)
Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L.: Detection of harassment on web 2.0. Proc. Content Anal. WEB 2, 1–7 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Mubarak, H., Darwish, K. (2019). Arabic Offensive Language Classification on Twitter. In: Weber, I., et al. Social Informatics. SocInfo 2019. Lecture Notes in Computer Science(), vol 11864. Springer, Cham. https://doi.org/10.1007/978-3-030-34971-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-34971-4_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34970-7
Online ISBN: 978-3-030-34971-4
eBook Packages: Computer ScienceComputer Science (R0)