Abstract
Detection of ethnic hate speech and other types of ethnicity representation is user texts is an important goal both for social and computer science, as well as for public policy making. To date, quite a few algorithms have been trained to detect hate speech, however, what policy makers and social scientists need are complete pipelines, from definition of ethnicity to a user-friendly monitoring system able to aggregate results of large-scale social media analysis. In this essay, the author summarizes the experience of development of such a system in a series of projects under the author’s leadership. All steps of the offered methodology are described and critically reviewed, and a special attention is paid to the strengths and the limitations of different approaches that were and can be applied along the developed pipeline. All conclusions are based on prior experiments with several large datasets from Russian language social media, including 15 000 marked up texts extracted from a representative one-year collection of 2.7 million user messages containing ethnonyms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Koltsova, O., Alexeeva, S., Nikolenko, S., Koltsov, M.: Measuring prejudice and ethnic tensions in user-generated content. Ann. Rev. CyberTherapy Telemed. (2017)
Koltsova, O., Nikolenko, S., Alexeeva, S., Nagornyy, O., Koltcov, S.: Detecting interethnic relations with the data from social media. In: Alexandrov, D.A., Boukhanovsky, A.V., Chugunov, A.V., Kabanov, Y., Koltsova, O. (eds.) DTGS 2017. CCIS, vol. 745, pp. 16–30. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69784-0_2
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1), 177–196 (2011)
Griffiths, T., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101, 5228–5235 (2004)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Flaounas, I., et al.: Research methods in the age of the digital journalism: massive-scale automated analysis of news content. Digit. Journal. 1(1), 102–116 (2013)
Nagornyy, O., Koltsova, O.: Mining media topics perceived as social problems by online audiences: use of a data mining approach in sociology. NRU Higher School of Economics, (WP BRP 74/SOC/2017)
Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: Privacy, Security, Risk and Trust (PASSAT), International Conference on Social Computing (SocialCom), Amsterdam, Netherlands, pp. 71–80 (2012)
Scharkow, M.: Thematic content analysis using supervised machine learning: an empirical evaluation using German online news. Qual. Quant. 47(2), 761–773 (2013)
Burscher, B., Odijk, D., Vliegenthart, R., de Rijke, M., de Vreese, C.H.: Teaching the computer to code frames in news: comparing two supervised machine learning approaches to frame analysis. Commun. Methods Meas. 8(3), 190–206 (2014)
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: SRW@ HLT-NAACL, pp. 88–93 (2016)
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, Stroudsburg, PA, USA, pp. 19–26. Association for Computational Linguistics (2012)
Burnap, P., Williams, M.: Cyber hate speech on Twitter: an application of machine classification and statistical modeling for policy and decision making. Policy Internet 7(2), 223–242 (2015)
Burnap, P., Williams, M.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 1–15 (2016)
Apishev, M., Koltsov, S., Koltsova, O., Nikolenko, S., Vorontsov, K.: Mining ethnic content online with additively regularized topic models. Computacion y Sistemas 20(3), 387–403 (2016)
Nikolenko, S., Koltcov, S., Koltsova, O.: Topic modelling for qualitative studies. J. Inf. Sci. 1, 1–15 (2017)
May, S.: Ethnicity, Nationalism and the Politics of Language. Taylor & Francis, Abingdon (2012)
Song, S.: The subject of multiculturalism: culture, religion, language, ethnicity, nationality, and race? In: Bruin, B., et al. (eds.) New Waves in Political Philosophy. Palgrave McMillan, London (2009). https://doi.org/10.1057/9780230234994_10
Yang, P.Q.: Ethnic Studies: Issues and Approaches. State University of New York Press, New York (2000)
Tulkens, S., Hilte, L., Lodewyckx, E., Verhoeven, D., Daelemans, W.A: Dictionary-based approach to racism detection in Dutch social media. In: First Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS2016), pp. 11–16 (2016)
Gitari, N.D., Zuping, Z., Hanyurwimfura, D., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimed. Ubiquit. Eng. 10(4), 215–230 (2015)
Xu, Z., Liu, Y., Mei, L., Luo, X., Wei, X., Hu, C.: Crowdsourcing based description of urban emergency events using social media big data. IEEE Trans. Cloud Comput. 99 (2016)
Zubiaga, A., Spina, D., Martínez, R., Fresno, V.: Real-time classification of Twitter trends. J. Assoc. Inf. Sci. Technol. 66(3), 462–473 (2015)
Yar, E., Delibalta, I., Baruh, L., Kozat, S.S.: Online text classification for real life tweet analysis. In: 24th Signal Processing and Communication Application Conference (2016)
Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(3), 267–297 (2013)
Günther, E., Quandt, T.: Word counts and topic models: automated text analysis methods for digital journalism research. Digit. Journal. 4(1), 75–88 (2016)
Vorontsov, K., Frei, O., Apishev, M., Romov, P., Dudarenko, M.: BigARTM: open source library for regularized multimodal topic modeling of large collections. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 370–381. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_36
Koltsova O., Pashakhin S.: Agenda divergence in a developing conflict: a quantitative evidence from a Ukrainian and a Russian TV newsfeeds. Sociology, WP BRP 79/SOC/2017
Bartlett, J., Reffin, J., Rumball, N., Williamson, S.: Anti-social media. Demos, 1–51 (2014)
Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: des Jardins, M., Littman, M.L. (eds.) AAAI, Bellevue, Washington, USA, pp. 1621–1622. AAAI Press (2013)
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)
Silva, L., Mondal, M., Correa, D., Benevenuto, F., Weber, I.: Analyzing the targets of hate in online social media. In: Proceedings of the 10th International Conference on Web and Social Media, ICWSM 2016, pp. 687–690 (2016)
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, pp. 29–30. ACM (2015)
Attenberg, J., Ipeirotis, P.G., Provost, F.J.: Beat the machine: challenging workers to find the unknown unknowns. In: Proceedings of 11th AAAI Conference on Human Computation, pp. 2–7 (2011)
Waseem Z.: Are you a racist or am i seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of 2016 EMNLP Workshop on Natural Language Processing and Computational Social Science, pp. 138–142. ACL, Austin (2016)
Gagliardone, I., Patel, A., Pohjonen, M.: Mapping and Analysing Hate Speech Online: Opportunities and Challenges for Ethiopia. University of Oxford, Oxford (2014)
Faris, R., Ashar, A., Gasser, U., Joo, D.: Understanding Harmful Speech Online. Berkman Klein Center Research Publication No. 2016-21 (2016)
Quillian, L.: New approaches to understanding prejudice and discrimination. Ann. Rev. Sociol. 32, 299–338 (2009)
Allport, G.W.: The Nature of Prejudice. Addison, New York (1954)
Sood, S.O., Churchill, E.F., Antin, J.: Automatic identification of personal insults on social news sites. J. Am. Soc. Inf. Sci. Technol. 63(2), 270–285 (2012)
Van Hee C., et al.: Detection and fine-grained classification of cyberbullying events. In: Proceedings of Recent Advances in Natural Language Processing, Proceedings, Hissar, Bulgaria, pp. 672–680 (2015)
Hosseinmardi, H., Mattson, S.A., Rafiq R.I., Han, R., Lv, Q., Mishra, S.: Detection of cyberbullying incidents on the Instagram social network. CoRR, abs/1503.03909 (2015)
Acknowledgements
This paper is mainly based on the experience from the research project “Development of concept and methodology for multi-level monitoring of the state of interethnic relations with the data from social media” RSF grant No 15-18-00091, 2015–2017, as well as the ongoing research implemented in the Laboratory for Internet Studies in the framework of the Basic Research Program of National Research University Higher School of Economics. The author is thankful to all project participants: Sergei Koltcov, Konstantin Vorontsov, Sergey Nikolenko, Svetlana Bodrunova, Murat Apishev, Svetlana Alexeeva, and Oleg Nagornyy.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Koltsova, O. (2019). Methodological Challenges for Detecting Interethnic Hostility on Social Media. In: Bodrunova, S., et al. Internet Science. INSCI 2018. Lecture Notes in Computer Science(), vol 11551. Springer, Cham. https://doi.org/10.1007/978-3-030-17705-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-17705-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17704-1
Online ISBN: 978-3-030-17705-8
eBook Packages: Computer ScienceComputer Science (R0)