Abstract
Online harassment and cyberbullying are becoming serious social health threats damaging people’s lives. This phenomenon is creating a need for automated, data-driven techniques for analyzing and detecting such detrimental online behaviors. We propose a weakly supervised machine learning method for simultaneously inferring user roles in harassment-based bullying and new vocabulary indicators of bullying. The learning algorithm considers social structure and infers which users tend to bully and which tend to be victimized. To address the elusive nature of cyberbullying using minimal effort and cost, the learning algorithm only requires weak supervision. The weak supervision is in the form of expert-provided small seed of bullying indicators, and the algorithm uses a large, unlabeled corpus of social media interactions to extract bullying roles of users and additional vocabulary indicators of bullying. The model estimates whether each social interaction is bullying based on who participates and based on what language is used, and it tries to maximize the agreement between these estimates, i.e., participant-vocabulary consistency (PVC). To evaluate PVC, we perform extensive quantitative and qualitative experiments on three social media datasets: Twitter, Ask.fm, and Instagram. We illustrate the strengths and weaknesses of the model by analyzing the identified conversations and key phrases by PVC. In addition, we demonstrate the distributions of bully and victim scores to examine the relationship between the tendencies of users to bully or to be victimized. We also perform fairness evaluation to analyze the potential for automated detection to be biased against particular groups.














Similar content being viewed by others
References
Ashktorab Z, Vitak J (2016) Designing cyberbullying mitigation and prevention solutions through participatory design with teenagers. in Proceedings of the CHI conference on human factors in computing systems, pp 3895–3905
Bellmore A, Calvin AJ, Xu J-M, Zhu X (2015) The five W’s of bullying on Twitter: who, what, why, where, and when. Comput Hum Behav 44:305–314
Bifet A, Frank E (2010) Sentiment knowledge discovery in Twitter streaming data. In: International conference on discovery science, pp 1–15
Boyd D (2014) It’s complicated. Yale University Press, New Haven
Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakali A (2017a) Mean birds: detecting aggression and bullying on twitter. In: Proceedings of the 2017 ACM on web science conference, June 2017
Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakali A (2017b) Measuring #gamergate: a tale of hate, sexism, and bullying. In: Proceedings of the 26th international conference on world wide web companion, pp 1285–1290
Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakili A (2017c) Hate is not binary: studying abusive behavior of #gamergate on twitter. In: Proceedings of the 28th ACM conference on hypertext and social media, July 2017
Chatzakou D, Kourtellis N, Blackburn J, De Cristofaro E, Stringhini G, Vakali A (2017) Detecting aggressors and bullies on Twitter. In: Proceedings of the 26th international conference on World Wide Web companion, ser. WWW ’17 Companion, pp 767–768. [Online]. https://doi.org/10.1145/3041021.3054211
Chelmis C, Zois D, Yao M (2018) Mining patterns of cyberbullying on Twitter. In: 2017 ieee international conference on data mining workshops (ICDMW), vol 00, pp 126–133. https://doi.org/10.1109/ICDMW.2017.22
Cheng J, Danescu-Niculescu-Mizil C, Leskovec J (2015) Antisocial behavior in online discussion communities. In: Proceedings of ICWSM, June 2017, pp 61–70
Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In: International conference on social computing, pp 71–80
Corcoran L, Guckin CM, Prentice G (2015) Cyberbullying or cyber aggression? A review of existing definitions of cyber-based peer-to-peer aggression. Societies 5(2):245–255
Dadvar M, de Jong F, Ordelman R, Trieschnigg D (2012) Improved cyberbullying detection using gender information. In: Dutch–Belgian information retrieval workshop, February 2012, pp 23–25
Davidson T, Warmsley D, Macy MW, Weber I (2017) Automated hate speech detection and the problem of offensive language. CoRR arXiv:abs/1703.04009
Dinakar K, Reichart R, Lieberman H (2011) Modeling the detection of textual cyberbullying. In: ICWSM workshop on social mobile web
ditchthelabel.org (2013) The annual cyberbullying survey. http://www.ditchthelabel.org/. Accessed 25 Sept 2013
Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N (2015) Hate speech detection with comment embeddings. In: International conference on World Wide Web, pp 29–30
Donath JS (1999) Identity and deception in the virtual community. Commun Cybersp 1996:29–59
Dordolo N (2014) The role of power imbalance in cyberbullying. Inkblot Undergrad J Psychol 3:35–41
Farrington DP (1993) Understanding and preventing bullying. Crime Justice 17:381–458
Herring SC (2002) Cyber violence: recognizing and resisting abuse in online environments. Asian Women 14:187–212
Hosseinmardi H, Ghasemianlangroodi A, Han R, Lv Q, Mishra S (2014) Towards understanding cyberbullying behavior in a semi-anonymous social network. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), August 2014, pp 244–252
Hosseinmardi H, Li S, Yang Z, Lv Q, Rafiq RI, Han R, Mishra S (2014) A comparison of common users across Instagram and Ask.fm to better understand cyberbullying. In: IEEE International confercne on big data and cloud computing
Hosseinmardi H, Mattson SA, Rafiq RI, Han R, Lv Q, Mishra S (2015) Analyzing labeled cyberbullying incidents on the Instagram social network. In: Intarnational confercne on social informatics, pp 49–66
Hosseinmardi H, Mattson SA, Rafiq RI, Han R, Lv Q, Mishra S (2015) Detection of cyberbullying incidents on the Instagram social network. In: Association for the advancement of artificial intelligence
Huang Q, Singh VK (2014) Cyber bullying detection using social and textual analysis. In: Proceedings of the international workshop on socially-aware multimedia, pp 3–6
Kowalski RM, Limber SP, Agatston PW (2012) Cyberbullying: bullying in the digital age. Wiley, New York
Lavrenko V, Croft WB (2001) Relevance based language models. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval, pp 120–127
Mahendiran A, Wang W, Arredondo J, Huang B, Getoor L, Mares D, Ramakrishnan N (2014) Discovering evolving political vocabulary in social media. In: International conference on behavioral, economic, and socio-cultural computing
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Margono H, Yi X, Raikundalia GK (2014) Mining Indonesian cyber bullying patterns in social networks. Proceedings of the Australasian computer science conference, January 2014, vol 147
Massoudi K, Tsagkias M, de Rijke M, Weerkamp W (2011) Incorporating query expansion and quality indicators in searching microblog posts. Proc Eur Conf Adv Inf Retr 15(5):362–367
McGhee I, Bayzick J, Kontostathis A, Edwards L, McBride A, Jakubowski E (2011) Learning to identify internet sexual predation. Int J Electron Commerce 15(3):103–122
Nahar V, Li X, Pang C (2013) An effective approach for cyberbullying detection. Commun Inf Sci Manag Eng 3(5):238–247
Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. InL Proceedings of the international confercne on World Wide Web, pp 145–153
noswearing.com (2016) List of swear words and curse words. http://www.noswearing.com/dictionary. Accessed Jan 2016
Patchin JW, Hinduja S (2012) Cyberbullying prevention and response: expert perspectives. Routledge, New York
Patton DU, McKeown K, Rambow O, Macbeth J (2016) Using natural language processing and qualitative analysis to intervene in gang violence. arXiv preprint arXiv:1609.08779
Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: LIWC. Lawrence Erlbaum Associates, Mahway
Ptaszynski M, Dybala P, Matsuba T, Masui F, Rzepka R, Araki K (2010) Machine learning and affect analysis against cyber-bullying. In: Linguistic and cognitive approaches to dialog agents symposium, pp 7–16
Raisi E, Huang B (2016) Cyberbullying identification using participant-vocabulary consistency. In: Proceedings of 2016 ICML workshop on #Data4Good: machine learning in social good applications
Raisi E, Huang B (2017) Cyberbullying detection with weakly supervised machine learning. In: Proceedings of the IEEE/acm international conference on social networks analysis and mining
Ramakrishnan N, Butler P, Self N, Khandpur R, Saraf P, Wang W, Cadena J, Vullikanti A, Korkmaz G, Kuhlman C, Marathe A, Zhao L, Ting H, Huang B, Srinivasan A, Trinh K, Getoor L, Katz G, Doyle A, Ackermann C, Zavorin I, Ford J, Summers K, Fayed Y, Arredondo J, Gupta D, Mares D (2014) Beating the news’ with EMBERS: forecasting civil unrest using open source indicators. In: ACM SIGKDD conference on knowledge discovery and data mining, pp 1799–1808
Reynolds K, Kontostathis A, Edwards L (2011) Using machine learning to detect cyberbullying. In: International conference on machine learning and applications and workshops (ICMLA), vol 2, pp 241–244
Shachaf P, Ha N (2010) Beyond vandalism: wikipedia trolls. J Inf Sci 36:357–370
Silva TH, de Melo PO, Almeida JM, Salles J, Loureiro AA (2013) A picture of Instagram is worth more than a thousand words: workload characterization and application. In: DCOSS, pp 123–132
Singh VK, Huang Q, Atrey PK (2016) Cyberbullying detection using probabilistic socio-textual information fusion. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), August 2016, pp 884–887. [Online]. https://doi.org/10.1109/ASONAM.2016.7752342
Smith PK, Mahdavi J, Carvalho M, Fisher S, Russell S, Tippett N (2008) Cyberbullying: its nature and impact in secondary school pupils. J Child Psychol Psychiatry 49(4):376–385
Tahmasbi N, Rastegari E (2018) A socio-contextual approach in automated detection of cyberbullying. In: Proceedings of the 51st Hawaii international conference on system sciences, pp 2151–2160
Tokunaga RS (2010) Following you home from school: a critical review and synthesis of research on cyberbullying victimization. Comput Hum Behav 26(3):277–287
Wang J, Iannotti RJ, Nansel TR (2009) School bullying among US adolescents: physical, verbal, relational and cyber. J Adolesc Health 45:368–375
Warner W, Hirschberg J (2012) Detecting hate speech on the world wide web. In: Workshop on language in social media, pp 19–26
Whitney I, Smith PK (1993) A survey of the nature and extent of bullying in junior/middle and secondary schools. Educ Res 35(1):3–25
Yin D, Xue Z, Hong L, Davison BD, Kontostathis A, Edwards L (2009) Detection of harassment on Web 2.0. In: Proceedings of the content analysis in the WEB 2.0 (CAW2.0) workshop at WWW2009, pp 1–7
Zois D-S, Kapodistria A, Yao M, Chelmis C (2018) Optimal online cyberbullying detection. In: 2018 IEEE international conference on acoustics, speech and signal processing. IEEE SigPort [Online]. http://sigport.org/2499
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raisi, E., Huang, B. Weakly supervised cyberbullying detection with participant-vocabulary consistency. Soc. Netw. Anal. Min. 8, 38 (2018). https://doi.org/10.1007/s13278-018-0517-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-018-0517-y