Methodological Challenges for Detecting Interethnic Hostility on Social Media

Koltsova, Olessia

doi:10.1007/978-3-030-17705-8_1

Olessia Koltsova ORCID: orcid.org/0000-0002-2669-3154²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11551))

Included in the following conference series:

International Conference on Internet Science

2881 Accesses
2 Citations

Abstract

Detection of ethnic hate speech and other types of ethnicity representation is user texts is an important goal both for social and computer science, as well as for public policy making. To date, quite a few algorithms have been trained to detect hate speech, however, what policy makers and social scientists need are complete pipelines, from definition of ethnicity to a user-friendly monitoring system able to aggregate results of large-scale social media analysis. In this essay, the author summarizes the experience of development of such a system in a series of projects under the author’s leadership. All steps of the offered methodology are described and critically reviewed, and a special attention is paid to the strengths and the limitations of different approaches that were and can be applied along the developed pipeline. All conclusions are based on prior experiments with several large datasets from Russian language social media, including 15 000 marked up texts extracted from a representative one-year collection of 2.7 million user messages containing ethnonyms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Koltsova, O., Alexeeva, S., Nikolenko, S., Koltsov, M.: Measuring prejudice and ethnic tensions in user-generated content. Ann. Rev. CyberTherapy Telemed. (2017)
Google Scholar
Koltsova, O., Nikolenko, S., Alexeeva, S., Nagornyy, O., Koltcov, S.: Detecting interethnic relations with the data from social media. In: Alexandrov, D.A., Boukhanovsky, A.V., Chugunov, A.V., Kabanov, Y., Koltsova, O. (eds.) DTGS 2017. CCIS, vol. 745, pp. 16–30. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69784-0_2
Chapter Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1), 177–196 (2011)
MATH Google Scholar
Griffiths, T., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101, 5228–5235 (2004)
Article Google Scholar
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Article Google Scholar
Flaounas, I., et al.: Research methods in the age of the digital journalism: massive-scale automated analysis of news content. Digit. Journal. 1(1), 102–116 (2013)
Article Google Scholar
Nagornyy, O., Koltsova, O.: Mining media topics perceived as social problems by online audiences: use of a data mining approach in sociology. NRU Higher School of Economics, (WP BRP 74/SOC/2017)
Google Scholar
Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: Privacy, Security, Risk and Trust (PASSAT), International Conference on Social Computing (SocialCom), Amsterdam, Netherlands, pp. 71–80 (2012)
Google Scholar
Scharkow, M.: Thematic content analysis using supervised machine learning: an empirical evaluation using German online news. Qual. Quant. 47(2), 761–773 (2013)
Article Google Scholar
Burscher, B., Odijk, D., Vliegenthart, R., de Rijke, M., de Vreese, C.H.: Teaching the computer to code frames in news: comparing two supervised machine learning approaches to frame analysis. Commun. Methods Meas. 8(3), 190–206 (2014)
Article Google Scholar
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: SRW@ HLT-NAACL, pp. 88–93 (2016)
Google Scholar
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, Stroudsburg, PA, USA, pp. 19–26. Association for Computational Linguistics (2012)
Google Scholar
Burnap, P., Williams, M.: Cyber hate speech on Twitter: an application of machine classification and statistical modeling for policy and decision making. Policy Internet 7(2), 223–242 (2015)
Article Google Scholar
Burnap, P., Williams, M.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 1–15 (2016)
Article Google Scholar
Apishev, M., Koltsov, S., Koltsova, O., Nikolenko, S., Vorontsov, K.: Mining ethnic content online with additively regularized topic models. Computacion y Sistemas 20(3), 387–403 (2016)
Google Scholar
Nikolenko, S., Koltcov, S., Koltsova, O.: Topic modelling for qualitative studies. J. Inf. Sci. 1, 1–15 (2017)
Article Google Scholar
May, S.: Ethnicity, Nationalism and the Politics of Language. Taylor & Francis, Abingdon (2012)
Google Scholar
Song, S.: The subject of multiculturalism: culture, religion, language, ethnicity, nationality, and race? In: Bruin, B., et al. (eds.) New Waves in Political Philosophy. Palgrave McMillan, London (2009). https://doi.org/10.1057/9780230234994_10
Chapter Google Scholar
Yang, P.Q.: Ethnic Studies: Issues and Approaches. State University of New York Press, New York (2000)
Google Scholar
Tulkens, S., Hilte, L., Lodewyckx, E., Verhoeven, D., Daelemans, W.A: Dictionary-based approach to racism detection in Dutch social media. In: First Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS2016), pp. 11–16 (2016)
Google Scholar
Gitari, N.D., Zuping, Z., Hanyurwimfura, D., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimed. Ubiquit. Eng. 10(4), 215–230 (2015)
Article Google Scholar
Xu, Z., Liu, Y., Mei, L., Luo, X., Wei, X., Hu, C.: Crowdsourcing based description of urban emergency events using social media big data. IEEE Trans. Cloud Comput. 99 (2016)
Google Scholar
Zubiaga, A., Spina, D., Martínez, R., Fresno, V.: Real-time classification of Twitter trends. J. Assoc. Inf. Sci. Technol. 66(3), 462–473 (2015)
Article Google Scholar
Yar, E., Delibalta, I., Baruh, L., Kozat, S.S.: Online text classification for real life tweet analysis. In: 24th Signal Processing and Communication Application Conference (2016)
Google Scholar
Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(3), 267–297 (2013)
Article Google Scholar
Günther, E., Quandt, T.: Word counts and topic models: automated text analysis methods for digital journalism research. Digit. Journal. 4(1), 75–88 (2016)
Article Google Scholar
Vorontsov, K., Frei, O., Apishev, M., Romov, P., Dudarenko, M.: BigARTM: open source library for regularized multimodal topic modeling of large collections. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 370–381. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_36
Chapter Google Scholar
Koltsova O., Pashakhin S.: Agenda divergence in a developing conflict: a quantitative evidence from a Ukrainian and a Russian TV newsfeeds. Sociology, WP BRP 79/SOC/2017
Google Scholar
Bartlett, J., Reffin, J., Rumball, N., Williamson, S.: Anti-social media. Demos, 1–51 (2014)
Google Scholar
Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: des Jardins, M., Littman, M.L. (eds.) AAAI, Bellevue, Washington, USA, pp. 1621–1622. AAAI Press (2013)
Google Scholar
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)
Google Scholar
Silva, L., Mondal, M., Correa, D., Benevenuto, F., Weber, I.: Analyzing the targets of hate in online social media. In: Proceedings of the 10th International Conference on Web and Social Media, ICWSM 2016, pp. 687–690 (2016)
Google Scholar
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, pp. 29–30. ACM (2015)
Google Scholar
Attenberg, J., Ipeirotis, P.G., Provost, F.J.: Beat the machine: challenging workers to find the unknown unknowns. In: Proceedings of 11th AAAI Conference on Human Computation, pp. 2–7 (2011)
Google Scholar
Waseem Z.: Are you a racist or am i seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of 2016 EMNLP Workshop on Natural Language Processing and Computational Social Science, pp. 138–142. ACL, Austin (2016)
Google Scholar
Gagliardone, I., Patel, A., Pohjonen, M.: Mapping and Analysing Hate Speech Online: Opportunities and Challenges for Ethiopia. University of Oxford, Oxford (2014)
Book Google Scholar
Faris, R., Ashar, A., Gasser, U., Joo, D.: Understanding Harmful Speech Online. Berkman Klein Center Research Publication No. 2016-21 (2016)
Google Scholar
Quillian, L.: New approaches to understanding prejudice and discrimination. Ann. Rev. Sociol. 32, 299–338 (2009)
Article Google Scholar
Allport, G.W.: The Nature of Prejudice. Addison, New York (1954)
Google Scholar
Sood, S.O., Churchill, E.F., Antin, J.: Automatic identification of personal insults on social news sites. J. Am. Soc. Inf. Sci. Technol. 63(2), 270–285 (2012)
Article Google Scholar
Van Hee C., et al.: Detection and fine-grained classification of cyberbullying events. In: Proceedings of Recent Advances in Natural Language Processing, Proceedings, Hissar, Bulgaria, pp. 672–680 (2015)
Google Scholar
Hosseinmardi, H., Mattson, S.A., Rafiq R.I., Han, R., Lv, Q., Mishra, S.: Detection of cyberbullying incidents on the Instagram social network. CoRR, abs/1503.03909 (2015)
Google Scholar

Download references

Acknowledgements

This paper is mainly based on the experience from the research project “Development of concept and methodology for multi-level monitoring of the state of interethnic relations with the data from social media” RSF grant No 15-18-00091, 2015–2017, as well as the ongoing research implemented in the Laboratory for Internet Studies in the framework of the Basic Research Program of National Research University Higher School of Economics. The author is thankful to all project participants: Sergei Koltcov, Konstantin Vorontsov, Sergey Nikolenko, Svetlana Bodrunova, Murat Apishev, Svetlana Alexeeva, and Oleg Nagornyy.

Author information

Authors and Affiliations

Laboratory for Internet Studies, National Research University Higher School of Economics, Room 117, 55/2 Sedova Street, Saint-Petersburg, Russia
Olessia Koltsova

Authors

Olessia Koltsova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olessia Koltsova .

Editor information

Editors and Affiliations

St. Petersburg State University, St. Petersburg, Russia
Svetlana S. Bodrunova
National Research University Higher School of Economics, St. Petersburg, Russia
Olessia Koltsova
SINTEF, Trondheim, Norway
Asbjørn Følstad
Inria, Le Chesnay, France
Harry Halpin
National Research University Higher School of Economics, Moscow, Russia
Polina Kolozaridi
National Research University Higher School of Economics, Moscow, Russia
Leonid Yuldashev
St. Petersburg State University, St. Petersburg, Russia
Anna Smoliarova
TU München, Munich, Germany
Heiko Niedermayer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koltsova, O. (2019). Methodological Challenges for Detecting Interethnic Hostility on Social Media. In: Bodrunova, S., et al. Internet Science. INSCI 2018. Lecture Notes in Computer Science(), vol 11551. Springer, Cham. https://doi.org/10.1007/978-3-030-17705-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-17705-8_1
Published: 17 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17704-1
Online ISBN: 978-3-030-17705-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics