Skip to main content
Log in

Weakly supervised cyberbullying detection with participant-vocabulary consistency

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Online harassment and cyberbullying are becoming serious social health threats damaging people’s lives. This phenomenon is creating a need for automated, data-driven techniques for analyzing and detecting such detrimental online behaviors. We propose a weakly supervised machine learning method for simultaneously inferring user roles in harassment-based bullying and new vocabulary indicators of bullying. The learning algorithm considers social structure and infers which users tend to bully and which tend to be victimized. To address the elusive nature of cyberbullying using minimal effort and cost, the learning algorithm only requires weak supervision. The weak supervision is in the form of expert-provided small seed of bullying indicators, and the algorithm uses a large, unlabeled corpus of social media interactions to extract bullying roles of users and additional vocabulary indicators of bullying. The model estimates whether each social interaction is bullying based on who participates and based on what language is used, and it tries to maximize the agreement between these estimates, i.e., participant-vocabulary consistency (PVC). To evaluate PVC, we perform extensive quantitative and qualitative experiments on three social media datasets: Twitter, Ask.fm, and Instagram. We illustrate the strengths and weaknesses of the model by analyzing the identified conversations and key phrases by PVC. In addition, we demonstrate the distributions of bully and victim scores to examine the relationship between the tendencies of users to bully or to be victimized. We also perform fairness evaluation to analyze the potential for automated detection to be biased against particular groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Ashktorab Z, Vitak J (2016) Designing cyberbullying mitigation and prevention solutions through participatory design with teenagers. in Proceedings of the CHI conference on human factors in computing systems, pp 3895–3905

  • Bellmore A, Calvin AJ, Xu J-M, Zhu X (2015) The five W’s of bullying on Twitter: who, what, why, where, and when. Comput Hum Behav 44:305–314

    Article  Google Scholar 

  • Bifet A, Frank E (2010) Sentiment knowledge discovery in Twitter streaming data. In: International conference on discovery science, pp 1–15

  • Boyd D (2014) It’s complicated. Yale University Press, New Haven

    Google Scholar 

  • Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakali A (2017a) Mean birds: detecting aggression and bullying on twitter. In: Proceedings of the 2017 ACM on web science conference, June 2017

  • Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakali A (2017b) Measuring #gamergate: a tale of hate, sexism, and bullying. In: Proceedings of the 26th international conference on world wide web companion, pp 1285–1290

  • Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakili A (2017c) Hate is not binary: studying abusive behavior of #gamergate on twitter. In: Proceedings of the 28th ACM conference on hypertext and social media, July 2017

  • Chatzakou D, Kourtellis N, Blackburn J, De Cristofaro E, Stringhini G, Vakali A (2017) Detecting aggressors and bullies on Twitter. In: Proceedings of the 26th international conference on World Wide Web companion, ser. WWW ’17 Companion, pp 767–768. [Online]. https://doi.org/10.1145/3041021.3054211

  • Chelmis C, Zois D, Yao M (2018) Mining patterns of cyberbullying on Twitter. In: 2017 ieee international conference on data mining workshops (ICDMW), vol 00, pp 126–133. https://doi.org/10.1109/ICDMW.2017.22

  • Cheng J, Danescu-Niculescu-Mizil C, Leskovec J (2015) Antisocial behavior in online discussion communities. In: Proceedings of ICWSM, June 2017, pp 61–70

  • Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In: International conference on social computing, pp 71–80

  • Corcoran L, Guckin CM, Prentice G (2015) Cyberbullying or cyber aggression? A review of existing definitions of cyber-based peer-to-peer aggression. Societies 5(2):245–255

    Article  Google Scholar 

  • Dadvar M, de Jong F, Ordelman R, Trieschnigg D (2012) Improved cyberbullying detection using gender information. In: Dutch–Belgian information retrieval workshop, February 2012, pp 23–25

  • Davidson T, Warmsley D, Macy MW, Weber I (2017) Automated hate speech detection and the problem of offensive language. CoRR arXiv:abs/1703.04009

  • Dinakar K, Reichart R, Lieberman H (2011) Modeling the detection of textual cyberbullying. In: ICWSM workshop on social mobile web

  • ditchthelabel.org (2013) The annual cyberbullying survey. http://www.ditchthelabel.org/. Accessed 25 Sept 2013

  • Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N (2015) Hate speech detection with comment embeddings. In: International conference on World Wide Web, pp 29–30

  • Donath JS (1999) Identity and deception in the virtual community. Commun Cybersp 1996:29–59

    Google Scholar 

  • Dordolo N (2014) The role of power imbalance in cyberbullying. Inkblot Undergrad J Psychol 3:35–41

    Google Scholar 

  • Farrington DP (1993) Understanding and preventing bullying. Crime Justice 17:381–458

    Article  Google Scholar 

  • Herring SC (2002) Cyber violence: recognizing and resisting abuse in online environments. Asian Women 14:187–212

    Google Scholar 

  • Hosseinmardi H, Ghasemianlangroodi A, Han R, Lv Q, Mishra S (2014) Towards understanding cyberbullying behavior in a semi-anonymous social network. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), August 2014, pp 244–252

  • Hosseinmardi H, Li S, Yang Z, Lv Q, Rafiq RI, Han R, Mishra S (2014) A comparison of common users across Instagram and Ask.fm to better understand cyberbullying. In: IEEE International confercne on big data and cloud computing

  • Hosseinmardi H, Mattson SA, Rafiq RI, Han R, Lv Q, Mishra S (2015) Analyzing labeled cyberbullying incidents on the Instagram social network. In: Intarnational confercne on social informatics, pp 49–66

  • Hosseinmardi H, Mattson SA, Rafiq RI, Han R, Lv Q, Mishra S (2015) Detection of cyberbullying incidents on the Instagram social network. In: Association for the advancement of artificial intelligence

  • Huang Q, Singh VK (2014) Cyber bullying detection using social and textual analysis. In: Proceedings of the international workshop on socially-aware multimedia, pp 3–6

  • Kowalski RM, Limber SP, Agatston PW (2012) Cyberbullying: bullying in the digital age. Wiley, New York

    Google Scholar 

  • Lavrenko V, Croft WB (2001) Relevance based language models. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval, pp 120–127

  • Mahendiran A, Wang W, Arredondo J, Huang B, Getoor L, Mares D, Ramakrishnan N (2014) Discovering evolving political vocabulary in social media. In: International conference on behavioral, economic, and socio-cultural computing

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Margono H, Yi X, Raikundalia GK (2014) Mining Indonesian cyber bullying patterns in social networks. Proceedings of the Australasian computer science conference, January 2014, vol 147

  • Massoudi K, Tsagkias M, de Rijke M, Weerkamp W (2011) Incorporating query expansion and quality indicators in searching microblog posts. Proc Eur Conf Adv Inf Retr 15(5):362–367

    Google Scholar 

  • McGhee I, Bayzick J, Kontostathis A, Edwards L, McBride A, Jakubowski E (2011) Learning to identify internet sexual predation. Int J Electron Commerce 15(3):103–122

    Article  Google Scholar 

  • Nahar V, Li X, Pang C (2013) An effective approach for cyberbullying detection. Commun Inf Sci Manag Eng 3(5):238–247

    Google Scholar 

  • Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. InL Proceedings of the international confercne on World Wide Web, pp 145–153

  • noswearing.com (2016) List of swear words and curse words. http://www.noswearing.com/dictionary. Accessed Jan 2016

  • Patchin JW, Hinduja S (2012) Cyberbullying prevention and response: expert perspectives. Routledge, New York

    Book  Google Scholar 

  • Patton DU, McKeown K, Rambow O, Macbeth J (2016) Using natural language processing and qualitative analysis to intervene in gang violence. arXiv preprint arXiv:1609.08779

  • Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: LIWC. Lawrence Erlbaum Associates, Mahway

    Google Scholar 

  • Ptaszynski M, Dybala P, Matsuba T, Masui F, Rzepka R, Araki K (2010) Machine learning and affect analysis against cyber-bullying. In: Linguistic and cognitive approaches to dialog agents symposium, pp 7–16

  • Raisi E, Huang B (2016) Cyberbullying identification using participant-vocabulary consistency. In: Proceedings of 2016 ICML workshop on #Data4Good: machine learning in social good applications

  • Raisi E, Huang B (2017) Cyberbullying detection with weakly supervised machine learning. In: Proceedings of the IEEE/acm international conference on social networks analysis and mining

  • Ramakrishnan N, Butler P, Self N, Khandpur R, Saraf P, Wang W, Cadena J, Vullikanti A, Korkmaz G, Kuhlman C, Marathe A, Zhao L, Ting H, Huang B, Srinivasan A, Trinh K, Getoor L, Katz G, Doyle A, Ackermann C, Zavorin I, Ford J, Summers K, Fayed Y, Arredondo J, Gupta D, Mares D (2014) Beating the news’ with EMBERS: forecasting civil unrest using open source indicators. In: ACM SIGKDD conference on knowledge discovery and data mining, pp 1799–1808

  • Reynolds K, Kontostathis A, Edwards L (2011) Using machine learning to detect cyberbullying. In: International conference on machine learning and applications and workshops (ICMLA), vol 2, pp 241–244

  • Shachaf P, Ha N (2010) Beyond vandalism: wikipedia trolls. J Inf Sci 36:357–370

    Article  Google Scholar 

  • Silva TH, de Melo PO, Almeida JM, Salles J, Loureiro AA (2013) A picture of Instagram is worth more than a thousand words: workload characterization and application. In: DCOSS, pp 123–132

  • Singh VK, Huang Q, Atrey PK (2016) Cyberbullying detection using probabilistic socio-textual information fusion. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), August 2016, pp 884–887. [Online]. https://doi.org/10.1109/ASONAM.2016.7752342

  • Smith PK, Mahdavi J, Carvalho M, Fisher S, Russell S, Tippett N (2008) Cyberbullying: its nature and impact in secondary school pupils. J Child Psychol Psychiatry 49(4):376–385

    Article  Google Scholar 

  • Tahmasbi N, Rastegari E (2018) A socio-contextual approach in automated detection of cyberbullying. In: Proceedings of the 51st Hawaii international conference on system sciences, pp 2151–2160

  • Tokunaga RS (2010) Following you home from school: a critical review and synthesis of research on cyberbullying victimization. Comput Hum Behav 26(3):277–287

    Article  Google Scholar 

  • Wang J, Iannotti RJ, Nansel TR (2009) School bullying among US adolescents: physical, verbal, relational and cyber. J Adolesc Health 45:368–375

    Article  Google Scholar 

  • Warner W, Hirschberg J (2012) Detecting hate speech on the world wide web. In: Workshop on language in social media, pp 19–26

  • Whitney I, Smith PK (1993) A survey of the nature and extent of bullying in junior/middle and secondary schools. Educ Res 35(1):3–25

    Article  Google Scholar 

  • Yin D, Xue Z, Hong L, Davison BD, Kontostathis A, Edwards L (2009) Detection of harassment on Web 2.0. In: Proceedings of the content analysis in the WEB 2.0 (CAW2.0) workshop at WWW2009, pp 1–7

  • Zois D-S, Kapodistria A, Yao M, Chelmis C (2018) Optimal online cyberbullying detection. In: 2018 IEEE international conference on acoustics, speech and signal processing. IEEE SigPort [Online]. http://sigport.org/2499

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elaheh Raisi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raisi, E., Huang, B. Weakly supervised cyberbullying detection with participant-vocabulary consistency. Soc. Netw. Anal. Min. 8, 38 (2018). https://doi.org/10.1007/s13278-018-0517-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-018-0517-y

Keywords

Navigation