Weakly supervised cyberbullying detection with participant-vocabulary consistency

Raisi, Elaheh; Huang, Bert

doi:10.1007/s13278-018-0517-y

Weakly supervised cyberbullying detection with participant-vocabulary consistency

Original Article
Published: 01 June 2018

Volume 8, article number 38, (2018)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

837 Accesses
14 Citations
Explore all metrics

Abstract

Online harassment and cyberbullying are becoming serious social health threats damaging people’s lives. This phenomenon is creating a need for automated, data-driven techniques for analyzing and detecting such detrimental online behaviors. We propose a weakly supervised machine learning method for simultaneously inferring user roles in harassment-based bullying and new vocabulary indicators of bullying. The learning algorithm considers social structure and infers which users tend to bully and which tend to be victimized. To address the elusive nature of cyberbullying using minimal effort and cost, the learning algorithm only requires weak supervision. The weak supervision is in the form of expert-provided small seed of bullying indicators, and the algorithm uses a large, unlabeled corpus of social media interactions to extract bullying roles of users and additional vocabulary indicators of bullying. The model estimates whether each social interaction is bullying based on who participates and based on what language is used, and it tries to maximize the agreement between these estimates, i.e., participant-vocabulary consistency (PVC). To evaluate PVC, we perform extensive quantitative and qualitative experiments on three social media datasets: Twitter, Ask.fm, and Instagram. We illustrate the strengths and weaknesses of the model by analyzing the identified conversations and key phrases by PVC. In addition, we demonstrate the distributions of bully and victim scores to examine the relationship between the tendencies of users to bully or to be victimized. We also perform fairness evaluation to analyze the potential for automated detection to be biased against particular groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ashktorab Z, Vitak J (2016) Designing cyberbullying mitigation and prevention solutions through participatory design with teenagers. in Proceedings of the CHI conference on human factors in computing systems, pp 3895–3905
Bellmore A, Calvin AJ, Xu J-M, Zhu X (2015) The five W’s of bullying on Twitter: who, what, why, where, and when. Comput Hum Behav 44:305–314
Article Google Scholar
Bifet A, Frank E (2010) Sentiment knowledge discovery in Twitter streaming data. In: International conference on discovery science, pp 1–15
Boyd D (2014) It’s complicated. Yale University Press, New Haven
Google Scholar
Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakali A (2017a) Mean birds: detecting aggression and bullying on twitter. In: Proceedings of the 2017 ACM on web science conference, June 2017
Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakali A (2017b) Measuring #gamergate: a tale of hate, sexism, and bullying. In: Proceedings of the 26th international conference on world wide web companion, pp 1285–1290
Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakili A (2017c) Hate is not binary: studying abusive behavior of #gamergate on twitter. In: Proceedings of the 28th ACM conference on hypertext and social media, July 2017
Chatzakou D, Kourtellis N, Blackburn J, De Cristofaro E, Stringhini G, Vakali A (2017) Detecting aggressors and bullies on Twitter. In: Proceedings of the 26th international conference on World Wide Web companion, ser. WWW ’17 Companion, pp 767–768. [Online]. https://doi.org/10.1145/3041021.3054211
Chelmis C, Zois D, Yao M (2018) Mining patterns of cyberbullying on Twitter. In: 2017 ieee international conference on data mining workshops (ICDMW), vol 00, pp 126–133. https://doi.org/10.1109/ICDMW.2017.22
Cheng J, Danescu-Niculescu-Mizil C, Leskovec J (2015) Antisocial behavior in online discussion communities. In: Proceedings of ICWSM, June 2017, pp 61–70
Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In: International conference on social computing, pp 71–80
Corcoran L, Guckin CM, Prentice G (2015) Cyberbullying or cyber aggression? A review of existing definitions of cyber-based peer-to-peer aggression. Societies 5(2):245–255
Article Google Scholar
Dadvar M, de Jong F, Ordelman R, Trieschnigg D (2012) Improved cyberbullying detection using gender information. In: Dutch–Belgian information retrieval workshop, February 2012, pp 23–25
Davidson T, Warmsley D, Macy MW, Weber I (2017) Automated hate speech detection and the problem of offensive language. CoRR arXiv:abs/1703.04009
Dinakar K, Reichart R, Lieberman H (2011) Modeling the detection of textual cyberbullying. In: ICWSM workshop on social mobile web
ditchthelabel.org (2013) The annual cyberbullying survey. http://www.ditchthelabel.org/. Accessed 25 Sept 2013
Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N (2015) Hate speech detection with comment embeddings. In: International conference on World Wide Web, pp 29–30
Donath JS (1999) Identity and deception in the virtual community. Commun Cybersp 1996:29–59
Google Scholar
Dordolo N (2014) The role of power imbalance in cyberbullying. Inkblot Undergrad J Psychol 3:35–41
Google Scholar
Farrington DP (1993) Understanding and preventing bullying. Crime Justice 17:381–458
Article Google Scholar
Herring SC (2002) Cyber violence: recognizing and resisting abuse in online environments. Asian Women 14:187–212
Google Scholar
Hosseinmardi H, Ghasemianlangroodi A, Han R, Lv Q, Mishra S (2014) Towards understanding cyberbullying behavior in a semi-anonymous social network. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), August 2014, pp 244–252
Hosseinmardi H, Li S, Yang Z, Lv Q, Rafiq RI, Han R, Mishra S (2014) A comparison of common users across Instagram and Ask.fm to better understand cyberbullying. In: IEEE International confercne on big data and cloud computing
Hosseinmardi H, Mattson SA, Rafiq RI, Han R, Lv Q, Mishra S (2015) Analyzing labeled cyberbullying incidents on the Instagram social network. In: Intarnational confercne on social informatics, pp 49–66
Hosseinmardi H, Mattson SA, Rafiq RI, Han R, Lv Q, Mishra S (2015) Detection of cyberbullying incidents on the Instagram social network. In: Association for the advancement of artificial intelligence
Huang Q, Singh VK (2014) Cyber bullying detection using social and textual analysis. In: Proceedings of the international workshop on socially-aware multimedia, pp 3–6
Kowalski RM, Limber SP, Agatston PW (2012) Cyberbullying: bullying in the digital age. Wiley, New York
Google Scholar
Lavrenko V, Croft WB (2001) Relevance based language models. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval, pp 120–127
Mahendiran A, Wang W, Arredondo J, Huang B, Getoor L, Mares D, Ramakrishnan N (2014) Discovering evolving political vocabulary in social media. In: International conference on behavioral, economic, and socio-cultural computing
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book Google Scholar
Margono H, Yi X, Raikundalia GK (2014) Mining Indonesian cyber bullying patterns in social networks. Proceedings of the Australasian computer science conference, January 2014, vol 147
Massoudi K, Tsagkias M, de Rijke M, Weerkamp W (2011) Incorporating query expansion and quality indicators in searching microblog posts. Proc Eur Conf Adv Inf Retr 15(5):362–367
Google Scholar
McGhee I, Bayzick J, Kontostathis A, Edwards L, McBride A, Jakubowski E (2011) Learning to identify internet sexual predation. Int J Electron Commerce 15(3):103–122
Article Google Scholar
Nahar V, Li X, Pang C (2013) An effective approach for cyberbullying detection. Commun Inf Sci Manag Eng 3(5):238–247
Google Scholar
Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. InL Proceedings of the international confercne on World Wide Web, pp 145–153
noswearing.com (2016) List of swear words and curse words. http://www.noswearing.com/dictionary. Accessed Jan 2016
Patchin JW, Hinduja S (2012) Cyberbullying prevention and response: expert perspectives. Routledge, New York
Book Google Scholar
Patton DU, McKeown K, Rambow O, Macbeth J (2016) Using natural language processing and qualitative analysis to intervene in gang violence. arXiv preprint arXiv:1609.08779
Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: LIWC. Lawrence Erlbaum Associates, Mahway
Google Scholar
Ptaszynski M, Dybala P, Matsuba T, Masui F, Rzepka R, Araki K (2010) Machine learning and affect analysis against cyber-bullying. In: Linguistic and cognitive approaches to dialog agents symposium, pp 7–16
Raisi E, Huang B (2016) Cyberbullying identification using participant-vocabulary consistency. In: Proceedings of 2016 ICML workshop on #Data4Good: machine learning in social good applications
Raisi E, Huang B (2017) Cyberbullying detection with weakly supervised machine learning. In: Proceedings of the IEEE/acm international conference on social networks analysis and mining
Ramakrishnan N, Butler P, Self N, Khandpur R, Saraf P, Wang W, Cadena J, Vullikanti A, Korkmaz G, Kuhlman C, Marathe A, Zhao L, Ting H, Huang B, Srinivasan A, Trinh K, Getoor L, Katz G, Doyle A, Ackermann C, Zavorin I, Ford J, Summers K, Fayed Y, Arredondo J, Gupta D, Mares D (2014) Beating the news’ with EMBERS: forecasting civil unrest using open source indicators. In: ACM SIGKDD conference on knowledge discovery and data mining, pp 1799–1808
Reynolds K, Kontostathis A, Edwards L (2011) Using machine learning to detect cyberbullying. In: International conference on machine learning and applications and workshops (ICMLA), vol 2, pp 241–244
Shachaf P, Ha N (2010) Beyond vandalism: wikipedia trolls. J Inf Sci 36:357–370
Article Google Scholar
Silva TH, de Melo PO, Almeida JM, Salles J, Loureiro AA (2013) A picture of Instagram is worth more than a thousand words: workload characterization and application. In: DCOSS, pp 123–132
Singh VK, Huang Q, Atrey PK (2016) Cyberbullying detection using probabilistic socio-textual information fusion. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), August 2016, pp 884–887. [Online]. https://doi.org/10.1109/ASONAM.2016.7752342
Smith PK, Mahdavi J, Carvalho M, Fisher S, Russell S, Tippett N (2008) Cyberbullying: its nature and impact in secondary school pupils. J Child Psychol Psychiatry 49(4):376–385
Article Google Scholar
Tahmasbi N, Rastegari E (2018) A socio-contextual approach in automated detection of cyberbullying. In: Proceedings of the 51st Hawaii international conference on system sciences, pp 2151–2160
Tokunaga RS (2010) Following you home from school: a critical review and synthesis of research on cyberbullying victimization. Comput Hum Behav 26(3):277–287
Article Google Scholar
Wang J, Iannotti RJ, Nansel TR (2009) School bullying among US adolescents: physical, verbal, relational and cyber. J Adolesc Health 45:368–375
Article Google Scholar
Warner W, Hirschberg J (2012) Detecting hate speech on the world wide web. In: Workshop on language in social media, pp 19–26
Whitney I, Smith PK (1993) A survey of the nature and extent of bullying in junior/middle and secondary schools. Educ Res 35(1):3–25
Article Google Scholar
Yin D, Xue Z, Hong L, Davison BD, Kontostathis A, Edwards L (2009) Detection of harassment on Web 2.0. In: Proceedings of the content analysis in the WEB 2.0 (CAW2.0) workshop at WWW2009, pp 1–7
Zois D-S, Kapodistria A, Yao M, Chelmis C (2018) Optimal online cyberbullying detection. In: 2018 IEEE international conference on acoustics, speech and signal processing. IEEE SigPort [Online]. http://sigport.org/2499

Download references

Author information

Authors and Affiliations

Department of Computer Science, Virginia Tech, Blacksburg, USA
Elaheh Raisi & Bert Huang

Authors

Elaheh Raisi
View author publications
You can also search for this author in PubMed Google Scholar
Bert Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elaheh Raisi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raisi, E., Huang, B. Weakly supervised cyberbullying detection with participant-vocabulary consistency. Soc. Netw. Anal. Min. 8, 38 (2018). https://doi.org/10.1007/s13278-018-0517-y

Download citation

Received: 21 December 2017
Revised: 07 May 2018
Accepted: 24 May 2018
Published: 01 June 2018
DOI: https://doi.org/10.1007/s13278-018-0517-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weakly supervised cyberbullying detection with participant-vocabulary consistency

Abstract

Access this article

Similar content being viewed by others

Reduced-Bias Co-trained Ensembles for Weakly Supervised Cyberbullying Detection

A Survey About the Cyberbullying Problem on Social Media by Using Machine Learning Approaches

Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weakly supervised cyberbullying detection with participant-vocabulary consistency

Abstract

Access this article

Similar content being viewed by others

Reduced-Bias Co-trained Ensembles for Weakly Supervised Cyberbullying Detection

A Survey About the Cyberbullying Problem on Social Media by Using Machine Learning Approaches

Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation