Abstract
Online hate speech is a phenomenon with considerable consequences for our society. Its automatic detection using machine learning is a promising approach to contain its spread. However, classifying abusive language with a model that purely relies on text data is limited in performance due to the complexity and diversity of speech (e.g., irony, sarcasm). Moreover, studies have shown that a significant amount of hate on social media platforms stems from online hate communities. Therefore, we develop an abusive language detection model leveraging user and network data to improve the classification performance. We integrate the explainable AI framework SHAP (SHapley Additive exPlanations) to alleviate the general issue of missing transparency associated with deep learning models, allowing us to assess the model’s vulnerability toward bias and systematic discrimination reliably. Furthermore, we evaluate our multimodel architecture on three datasets in two languages (i.e., English and German). Our results show that user-specific timeline and network data can improve the classification, while the additional explanations resulting from SHAP make the predictions of the model interpretable to humans.
Keywords
Warning: This paper contains content that may be abusive or offensive.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Code available on https://github.com/mawic/multimodal-abusive-language-detection.
- 2.
- 3.
If a user is mentioned in a tweet, an “@” symbol appears before the user name.
- 4.
Network data is not avaiable for all users.
References
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), P10008 (2008)
Campbell, W., Baseman, E., Greenfield, K.: Content + context networks for user classification in twitter. In: Frontiers of Network Analysis, NIPS Workshop, 9 December 2013 (2013)
Chatzakou, D., Kourtellis, N., Blackburn, J., De Cristofaro, E., Stringhini, G., Vakali, A.: Mean birds: Detecting aggression and bullying on twitter. In: WebSci, pp. 13–22 (2017)
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of 11th ICWSM Conference (2017)
Fehn Unsvåg, E., Gambäck, B.: The effects of user features on Twitter hate speech detection. In: Proceedings of 2nd Workshop on Abusive Language Online (ALW2), pp. 75–85. ACL (2018)
Founta, A.M., Chatzakou, D., Kourtellis, N., Blackburn, J., Vakali, A., Leontiadis, I.: A unified deep learning architecture for abuse detection. In: WebSci, pp. 105–114. ACM (2019)
Friedman, B., Nissenbaum, H.: Bias in computer systems. ACM Trans. Inf. Syst. 14(3), 330–347 (1996)
Garland, J., Ghazi-Zahedi, K., Young, J.G., Hébert-Dufresne, L., Galesic, M.: Countering hate on social media: large scale classification of hate and counter speech. In: Proceedings of 4th Workshop on Online Abuse and Harms, pp. 102–112 (2020)
Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS, pp. 1024–1034 (2017)
Hennig, M., Brandes, U., Pfeffer, J., Mergel, I.: Studying Social Networks. A Guide to Empirical Research, Campus Verlag, New York (2012)
Kreißel, P., Ebner, J., Urban, A., Guhl, J.: Hass auf Knopfdruck. Rechtsextreme Trollfabriken und das Ökosystem koordinierter Hasskampagnen im Netz, Institute for Strategic Dialogue (2018)
Li, S., Zaidi, N.A., Liu, Q., Li, G.: Neighbours and kinsmen: hateful users detection with graph neural network. In: Karlapalem, K., Cheng, H., Ramakrishnan, N., Agrawal, R.K., Reddy, P.K., Srivastava, J., Chakraborty, T. (eds.) PAKDD 2021. LNCS (LNAI), vol. 12712, pp. 434–446. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75762-5_35
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS (2017)
Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: Hatexplain: A benchmark dataset for explainable hate speech detection. arXiv preprint arXiv:2012.10289 (2020)
Mishra, P., Del Tredici, M., Yannakoudakis, H., Shutova, E.: Author profiling for abuse detection. In: COLING, pp. 1088–1098. ACL (2018)
Mishra, P., Yannakoudakis, H., Shutova, E.: Tackling online abuse: A survey of automated abuse detection methods. arXiv preprint arXiv:1908.06024 (2019)
Molnar, C.: Interpretable Machine Learning (2019). https://christophm.github.io/interpretable-ml-book/
Mosca, E., Wich, M., Groh, G.: Understanding and interpreting the impact of user context in hate speech detection. In: Proceedings of 9th International Workshop on Natural Language Processing for Social Media, pp. 91–102. ACL (2021)
Papegnies, E., Labatut, V., Dufour, R., Linarès, G.: Graph-based features for automatic online abuse detection. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 70–81. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68456-7_6
Qian, J., ElSherief, M., Belding, E., Wang, W.Y.: Leveraging intra-user and inter-user representation learning for automated hate speech detection. In: NAACL 2018 (Short Papers), pp. 118–123. ACL (2018)
Raisi, E., Huang, B.: Cyberbullying detection with weakly supervised machine learning, ASONAM 2017, pp. 409–416. Association for Computing Machinery, New York (2017)
Ribeiro, M., Calais, P., Santos, Y., Almeida, V., Meira Jr, W.: Characterizing and detecting hateful users on twitter. In: Proceedings of International AAAI Conference on Web and Social Media, vol. 12 (2018)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: 2019 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS 2019 (2019)
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings 5th International Workshop on Natural Language Processing for Social Media, pp. 1–10. ACL (2017)
Shapley, L.: Quota solutions of n-person games. Contrib. Theor. Games 2, 343–359 (1953)
Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. and Inf. Syst. 41(3), 647–665 (2013). https://doi.org/10.1007/s10115-013-0679-x
Švec, A., Pikuliak, M., Šimko, M., Bieliková, M.: Improving moderation of online discussions via interpretable neural models. In: Proceedings of 2nd Workshop on Abusive Language Online (ALW2), pp. 60–65. ACL (2018)
Vidgen, B., et al.: Detecting East Asian prejudice on social media. In: Proceedings of 4th Workshop on Online Abuse and Harms, pp. 162–172. ACL (2020)
Vidgen, B., Harris, A., Nguyen, D., Tromble, R., Hale, S., Margetts, H.: Challenges and frontiers in abusive content detection. In: Proceedings of 3rd Workshop on Abusive Language Online, pp. 80–93. ACL (2019)
Vijayaraghavan, P., Larochelle, H., Roy, D.: Interpretable multi-modal hate speech detection. In: Proceedings of International Conference on Machine Learning AI for Social Good Workshop (2019)
Wang, C.: Interpreting neural network hate speech classifiers. In: Proceedings of 2nd Workshop on Abusive Language Online (ALW2), pp. 86–92. ACL (2018)
Waseem, Z., Davidson, T., Warmsley, D., Weber, I.: Understanding abuse: A typology of abusive language detection subtasks. In: Proceedings of 1st Workshop on Abusive Language Online, pp. 78–84. ACL (2017)
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In: Proceedings of NAACL Student Research Workshop, pp. 88–93. ACL (2016)
Wich, M., Bauer, J., Groh, G.: Impact of politically biased data on hate speech classification. In: Proceedings of 4th Workshop on Online Abuse and Harms, pp. 54–64. ACL (2020)
Wich, M., Breitinger, M., Strathern, W., Naimarevic, M., Groh, G., Pfeffer, J.: Are your friends also haters? identification of hater networks on social media: data paper. In: Companion Proceedings of Web Conference 2021, ACM (2021)
Williams, M.L., Burnap, P., Javed, A., Liu, H., Ozalp, S.: Hate in the machine: Anti-black and anti-muslim social media posts as predictors of offline racially and religiously aggravated crime. Br. J. Criminol. 60(1), 93–117 (2020)
Wolf, T., et al.: Transformers: State-of-the-art natural language processing. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. ACL (2020)
Xu, J., Lu, T.C., et al.: Automated classification of extremist twitter accounts using content-based and network-based features. In: 2016 IEEE International Conference on Big Data, pp. 2545–2549. IEEE (2016)
Acknowledgments
We would like to thank Anika Apel and Mariam Khuchua for their contribution to this project. The research has been partially funded by a scholarship from the Hanns Seidel Foundation financed by the German Federal Ministry of Education and Research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wich, M., Mosca, E., Gorniak, A., Hingerl, J., Groh, G. (2021). Explainable Abusive Language Classification Leveraging User and Network Data. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12979. Springer, Cham. https://doi.org/10.1007/978-3-030-86517-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-86517-7_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86516-0
Online ISBN: 978-3-030-86517-7
eBook Packages: Computer ScienceComputer Science (R0)