Skip to main content

Methodological Challenges for Detecting Interethnic Hostility on Social Media

  • Conference paper
  • First Online:
Internet Science (INSCI 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11551))

Included in the following conference series:

Abstract

Detection of ethnic hate speech and other types of ethnicity representation is user texts is an important goal both for social and computer science, as well as for public policy making. To date, quite a few algorithms have been trained to detect hate speech, however, what policy makers and social scientists need are complete pipelines, from definition of ethnicity to a user-friendly monitoring system able to aggregate results of large-scale social media analysis. In this essay, the author summarizes the experience of development of such a system in a series of projects under the author’s leadership. All steps of the offered methodology are described and critically reviewed, and a special attention is paid to the strengths and the limitations of different approaches that were and can be applied along the developed pipeline. All conclusions are based on prior experiments with several large datasets from Russian language social media, including 15 000 marked up texts extracted from a representative one-year collection of 2.7 million user messages containing ethnonyms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Koltsova, O., Alexeeva, S., Nikolenko, S., Koltsov, M.: Measuring prejudice and ethnic tensions in user-generated content. Ann. Rev. CyberTherapy Telemed. (2017)

    Google Scholar 

  2. Koltsova, O., Nikolenko, S., Alexeeva, S., Nagornyy, O., Koltcov, S.: Detecting interethnic relations with the data from social media. In: Alexandrov, D.A., Boukhanovsky, A.V., Chugunov, A.V., Kabanov, Y., Koltsova, O. (eds.) DTGS 2017. CCIS, vol. 745, pp. 16–30. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69784-0_2

    Chapter  Google Scholar 

  3. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1), 177–196 (2011)

    MATH  Google Scholar 

  4. Griffiths, T., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101, 5228–5235 (2004)

    Article  Google Scholar 

  5. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  6. Flaounas, I., et al.: Research methods in the age of the digital journalism: massive-scale automated analysis of news content. Digit. Journal. 1(1), 102–116 (2013)

    Article  Google Scholar 

  7. Nagornyy, O., Koltsova, O.: Mining media topics perceived as social problems by online audiences: use of a data mining approach in sociology. NRU Higher School of Economics, (WP BRP 74/SOC/2017)

    Google Scholar 

  8. Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: Privacy, Security, Risk and Trust (PASSAT), International Conference on Social Computing (SocialCom), Amsterdam, Netherlands, pp. 71–80 (2012)

    Google Scholar 

  9. Scharkow, M.: Thematic content analysis using supervised machine learning: an empirical evaluation using German online news. Qual. Quant. 47(2), 761–773 (2013)

    Article  Google Scholar 

  10. Burscher, B., Odijk, D., Vliegenthart, R., de Rijke, M., de Vreese, C.H.: Teaching the computer to code frames in news: comparing two supervised machine learning approaches to frame analysis. Commun. Methods Meas. 8(3), 190–206 (2014)

    Article  Google Scholar 

  11. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: SRW@ HLT-NAACL, pp. 88–93 (2016)

    Google Scholar 

  12. Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, Stroudsburg, PA, USA, pp. 19–26. Association for Computational Linguistics (2012)

    Google Scholar 

  13. Burnap, P., Williams, M.: Cyber hate speech on Twitter: an application of machine classification and statistical modeling for policy and decision making. Policy Internet 7(2), 223–242 (2015)

    Article  Google Scholar 

  14. Burnap, P., Williams, M.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 1–15 (2016)

    Article  Google Scholar 

  15. Apishev, M., Koltsov, S., Koltsova, O., Nikolenko, S., Vorontsov, K.: Mining ethnic content online with additively regularized topic models. Computacion y Sistemas 20(3), 387–403 (2016)

    Google Scholar 

  16. Nikolenko, S., Koltcov, S., Koltsova, O.: Topic modelling for qualitative studies. J. Inf. Sci. 1, 1–15 (2017)

    Article  Google Scholar 

  17. May, S.: Ethnicity, Nationalism and the Politics of Language. Taylor & Francis, Abingdon (2012)

    Google Scholar 

  18. Song, S.: The subject of multiculturalism: culture, religion, language, ethnicity, nationality, and race? In: Bruin, B., et al. (eds.) New Waves in Political Philosophy. Palgrave McMillan, London (2009). https://doi.org/10.1057/9780230234994_10

    Chapter  Google Scholar 

  19. Yang, P.Q.: Ethnic Studies: Issues and Approaches. State University of New York Press, New York (2000)

    Google Scholar 

  20. Tulkens, S., Hilte, L., Lodewyckx, E., Verhoeven, D., Daelemans, W.A: Dictionary-based approach to racism detection in Dutch social media. In: First Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS2016), pp. 11–16 (2016)

    Google Scholar 

  21. Gitari, N.D., Zuping, Z., Hanyurwimfura, D., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimed. Ubiquit. Eng. 10(4), 215–230 (2015)

    Article  Google Scholar 

  22. Xu, Z., Liu, Y., Mei, L., Luo, X., Wei, X., Hu, C.: Crowdsourcing based description of urban emergency events using social media big data. IEEE Trans. Cloud Comput. 99 (2016)

    Google Scholar 

  23. Zubiaga, A., Spina, D., Martínez, R., Fresno, V.: Real-time classification of Twitter trends. J. Assoc. Inf. Sci. Technol. 66(3), 462–473 (2015)

    Article  Google Scholar 

  24. Yar, E., Delibalta, I., Baruh, L., Kozat, S.S.: Online text classification for real life tweet analysis. In: 24th Signal Processing and Communication Application Conference (2016)

    Google Scholar 

  25. Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(3), 267–297 (2013)

    Article  Google Scholar 

  26. Günther, E., Quandt, T.: Word counts and topic models: automated text analysis methods for digital journalism research. Digit. Journal. 4(1), 75–88 (2016)

    Article  Google Scholar 

  27. Vorontsov, K., Frei, O., Apishev, M., Romov, P., Dudarenko, M.: BigARTM: open source library for regularized multimodal topic modeling of large collections. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 370–381. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_36

    Chapter  Google Scholar 

  28. Koltsova O., Pashakhin S.: Agenda divergence in a developing conflict: a quantitative evidence from a Ukrainian and a Russian TV newsfeeds. Sociology, WP BRP 79/SOC/2017

    Google Scholar 

  29. Bartlett, J., Reffin, J., Rumball, N., Williamson, S.: Anti-social media. Demos, 1–51 (2014)

    Google Scholar 

  30. Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: des Jardins, M., Littman, M.L. (eds.) AAAI, Bellevue, Washington, USA, pp. 1621–1622. AAAI Press (2013)

    Google Scholar 

  31. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  32. Silva, L., Mondal, M., Correa, D., Benevenuto, F., Weber, I.: Analyzing the targets of hate in online social media. In: Proceedings of the 10th International Conference on Web and Social Media, ICWSM 2016, pp. 687–690 (2016)

    Google Scholar 

  33. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, pp. 29–30. ACM (2015)

    Google Scholar 

  34. Attenberg, J., Ipeirotis, P.G., Provost, F.J.: Beat the machine: challenging workers to find the unknown unknowns. In: Proceedings of 11th AAAI Conference on Human Computation, pp. 2–7 (2011)

    Google Scholar 

  35. Waseem Z.: Are you a racist or am i seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of 2016 EMNLP Workshop on Natural Language Processing and Computational Social Science, pp. 138–142. ACL, Austin (2016)

    Google Scholar 

  36. Gagliardone, I., Patel, A., Pohjonen, M.: Mapping and Analysing Hate Speech Online: Opportunities and Challenges for Ethiopia. University of Oxford, Oxford (2014)

    Book  Google Scholar 

  37. Faris, R., Ashar, A., Gasser, U., Joo, D.: Understanding Harmful Speech Online. Berkman Klein Center Research Publication No. 2016-21 (2016)

    Google Scholar 

  38. Quillian, L.: New approaches to understanding prejudice and discrimination. Ann. Rev. Sociol. 32, 299–338 (2009)

    Article  Google Scholar 

  39. Allport, G.W.: The Nature of Prejudice. Addison, New York (1954)

    Google Scholar 

  40. Sood, S.O., Churchill, E.F., Antin, J.: Automatic identification of personal insults on social news sites. J. Am. Soc. Inf. Sci. Technol. 63(2), 270–285 (2012)

    Article  Google Scholar 

  41. Van Hee C., et al.: Detection and fine-grained classification of cyberbullying events. In: Proceedings of Recent Advances in Natural Language Processing, Proceedings, Hissar, Bulgaria, pp. 672–680 (2015)

    Google Scholar 

  42. Hosseinmardi, H., Mattson, S.A., Rafiq R.I., Han, R., Lv, Q., Mishra, S.: Detection of cyberbullying incidents on the Instagram social network. CoRR, abs/1503.03909 (2015)

    Google Scholar 

Download references

Acknowledgements

This paper is mainly based on the experience from the research project “Development of concept and methodology for multi-level monitoring of the state of interethnic relations with the data from social media” RSF grant No 15-18-00091, 2015–2017, as well as the ongoing research implemented in the Laboratory for Internet Studies in the framework of the Basic Research Program of National Research University Higher School of Economics. The author is thankful to all project participants: Sergei Koltcov, Konstantin Vorontsov, Sergey Nikolenko, Svetlana Bodrunova, Murat Apishev, Svetlana Alexeeva, and Oleg Nagornyy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olessia Koltsova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Koltsova, O. (2019). Methodological Challenges for Detecting Interethnic Hostility on Social Media. In: Bodrunova, S., et al. Internet Science. INSCI 2018. Lecture Notes in Computer Science(), vol 11551. Springer, Cham. https://doi.org/10.1007/978-3-030-17705-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-17705-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-17704-1

  • Online ISBN: 978-3-030-17705-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics