skip to main content
10.1145/3308558.3313462acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Cyberbullying Ends Here: Towards Robust Detection of Cyberbullying in Social Media

Authors Info & Claims
Published:13 May 2019Publication History

ABSTRACT

The potentially detrimental effects of cyberbullying have led to the development of numerous automated, data-driven approaches, with emphasis on classification accuracy. Cyberbullying, as a form of abusive online behavior, although not well-defined, is a repetitive process, i.e., a sequence of aggressive messages sent from a bully to a victim over a period of time with the intent to harm the victim. Existing work has focused on harassment (i.e., using profanity to classify toxic comments independently) as an indicator of cyberbullying, disregarding the repetitive nature of this harassing process. However, raising a cyberbullying alert immediately after an aggressive comment is detected can lead to a high number of false positives. At the same time, two key practical challenges remain unaddressed: (i) detection timeliness, which is necessary to support victims as early as possible, and (ii) scalability to the staggering rates at which content is generated in online social networks.

In this work, we introduce CONcISE, a novel approach for timely and accurate Cyberbullying detectiON on Instagram media SEssions. We propose a sequential hypothesis testing formulation that seeks to drastically reduce the number of features used in classifying each comment while maintaining high classification accuracy. CONcISE raises an alert only after a certain number of detections have been made. Extensive experiments on a real-world Instagram dataset with ~ 4M users and ~ 10M comments demonstrate the effectiveness, scalability, and timeliness of our approach and its benefits over existing methods.

References

  1. Mohammed Ali Al-garadi, Kasturi Dewi Varathan, and Sri Devi Ravana. 2016. Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network. Computers in Human Behavior 63 (2016), 433-443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. P. Bertsekas. 2005. Dynamic Programming and Optimal Control. Vol. 1. Athena Scientific.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michael Buckland and Fredric Gey. 1994. The relationship between recall and precision. Journal of the American society for information science 45, 1(1994), 12-19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jiuwen Cao, Tao Chen, and Jiayuan Fan. 2014. Fast online learning algorithm for landmark recognition based on BoW framework. In Industrial Electronics and Applications (ICIEA), 2014 IEEE 9th Conference on. IEEE, 1163-1168.Google ScholarGoogle ScholarCross RefCross Ref
  5. Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Detecting Aggressors and Bullies on Twitter. In Proceedings of the 26th International Conference on World Wide Web Companion. 767-768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on Web Science Conference. ACM, 13-22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Vivek Singh Devin Soni. {n. d.}. Time Reveals AllWounds: Modeling Temporal Dynamics of Cyberbullying Sessions. In Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM 2018).Google ScholarGoogle Scholar
  8. AllSlang family. {n. d.}. Internet Slang Swear Word List & Curse Filter. https://www.noswearing.com/dictionary.Google ScholarGoogle Scholar
  9. Sujatha Das Gollapalli, Cornelia Caragea, Prasenjit Mitra, and C Lee Giles. 2013. Researcher homepage classification using unlabeled data. In Proceedings of the 22nd international conference on World Wide Web. ACM, 471-482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Leam Hackett. 2017. The Annual Bullying Survey 2017. https://www.ditchthelabel.org/wp-content/uploads/2017/07/The-Annual-Bullying-Survey-2017-1.pdf. (accessed on Aug. 30 2018).Google ScholarGoogle Scholar
  11. M. A. Hall. 1999. Correlation-based feature selection for machine learning. Ph.D. Dissertation. The University of Waikato.Google ScholarGoogle Scholar
  12. Sameer Hinduja and Justin W Patchin. 2007. Offline consequences of online victimization: School violence and delinquency. Journal of school violence 6, 3 (2007), 89-112.Google ScholarGoogle ScholarCross RefCross Ref
  13. Dianne L Hoff and Sidney N Mitchell. 2009. Cyberbullying: Causes, effects, and remedies. Journal of Educational Administration 47, 5 (2009), 652-665.Google ScholarGoogle ScholarCross RefCross Ref
  14. Steven CH Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. 2018. Online Learning: A Comprehensive Survey. arXiv preprint arXiv:1802.02871(2018).Google ScholarGoogle Scholar
  15. Steven CH Hoi, Jialei Wang, Peilin Zhao, and Rong Jin. 2012. Online feature selection for mining big data. In Proceedings of the 1st international workshop on big data, streams and heterogeneous source mining: Algorithms, systems, programming models and applications. ACM, 93-100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Homa Hosseinmardi, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2016. Prediction of cyberbullying incidents in a media-based social network. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 186-192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Guichun Hua, Min Zhang, Yiqun Liu, Shaoping Ma, and Liyun Ru. 2010. Hierarchical feature selection for ranking. In Proceedings of the 19th international conference on world wide web. ACM, 1113-1114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hao Huang, Shinjae Yoo, and Shiva Prasad Kasiviswanathan. 2015. Unsupervised feature selection on data streams. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1031-1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jyrki Kivinen, Alexander J Smola, and Robert C Williamson. 2004. Online learning with kernels. IEEE transactions on signal processing 52, 8 (2004), 2165-2176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Robin M Kowalski and Susan P Limber. 2013. Psychological, physical, and academic correlates of cyberbullying and traditional bullying. Journal of Adolescent Health 53, 1 (2013), S13-S20.Google ScholarGoogle ScholarCross RefCross Ref
  21. Haiguang Li, Xindong Wu, Zhao Li, and Wei Ding. 2013. Group feature selection with streaming features. In Data Mining (ICDM), 2013 IEEE 13th International Conference on. IEEE, 1109-1114.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jiguang Liang, Xiaofei Zhou, Li Guo, and Shuo Bai. 2015. Feature selection for sentiment classification using matrix factorization. In Proceedings of the 24th International Conference on World Wide Web. ACM, 63-64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Marill and D. Green. 1963. On the effectiveness of receptors in recognition systems. IEEE Transactions on Information Theory 9, 1 (1963), 11-17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, 145-153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Guansong Pang, Longbing Cao, Ling Chen, and Huan Liu. 2016. Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 410-419.Google ScholarGoogle ScholarCross RefCross Ref
  26. Simon Perkins and James Theiler. 2003. Online feature selection using grafting. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). 592-599. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Rahat Ibn Rafiq, Homa Hosseinmardi, Richard Han, Qin Lv, and Shivakant Mishra. 2018. Scalable and timely detection of cyberbullying in online social networks. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing. ACM, 1738-1747. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Rahat Ibn Rafiq, Homa Hosseinmardi, Richard Han, Qin Lv, Shivakant Mishra, and Sabrina Arredondo Mattson. 2015. Careful what you share in six seconds: Detecting cyberbullying instances in Vine. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. ACM, 617-622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Elaheh Raisi and Bert Huang. 2017. Cyberbullying detection with weakly supervised machine learning. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. ACM, 409-416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Elaheh Raisi and Bert Huang. 2018. Weakly Supervised Cyberbullying Detection Using Co-Trained Ensembles of Embedding Models. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 479-486.Google ScholarGoogle Scholar
  31. Weixiang Shao, Lifang He, Chun-Ta Lu, Xiaokai Wei, and S Yu Philip. 2016. Online unsupervised multi-view feature selection. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 1203-1208.Google ScholarGoogle ScholarCross RefCross Ref
  32. Albert N Shiryaev. 2007. Optimal Stopping Rules. Vol. 8. Springer Science & Business Media.Google ScholarGoogle Scholar
  33. Mifta Sintaha, Shahed Bin Satter, Niamat Zawad, Chaity Swarnaker, and Ahanaf Hassan. 2016. Cyberbullying detection using sentiment analysis in social media. Ph.D. Dissertation. BRAC University.Google ScholarGoogle Scholar
  34. Peter K Smith, Jess Mahdavi, Manuel Carvalho, and Neil Tippett. 2006. An investigation into cyberbullying, its forms, awareness and impact, and the relationship between age and gender in cyberbullying. Research Brief No. RBX03-06. London: DfES(2006).Google ScholarGoogle Scholar
  35. Luis von Ahn. {n. d.}. Offensive/Profane Word List. https://www.cs.cmu.edu/~biglou/resources/bad-words.txt.Google ScholarGoogle Scholar
  36. Jialei Wang, Peilin Zhao, and Steven CH Hoi. 2016. Soft confidence-weighted learning. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1(2016), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jialei Wang, Peilin Zhao, Steven CH Hoi, and Rong Jin. 2014. Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering 26, 3(2014), 698-710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xindong Wu, Kui Yu, Hao Wang, and Wei Ding. 2010. Online streaming feature selection. In Proceedings of the 27th international conference on machine learning (ICML-10). Citeseer, 1159-1166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mengfan Yao, Charalampos Chelmis, and Daphney-Stavroula Zois. 2018. Cyberbullying Detection on Instagram with Optimal Online Feature Selection. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 401-408.Google ScholarGoogle Scholar
  40. Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2016. Scalable and accurate online feature selection for big data. ACM Transactions on Knowledge Discovery from Data (TKDD) 11, 2(2016), 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Aonan Zhang, Jun Zhu, and Bo Zhang. 2013. Sparse online topic models. In Proceedings of the 22nd international conference on World Wide Web. ACM, 1489-1500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Liang Zhang, Jie Yang, and Belle Tseng. 2012. Online modeling of proactive moderation system for auction fraud detection. In Proceedings of the 21st international conference on World Wide Web. ACM, 669-678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xiang Zhang, Jonathan Tong, Nishant Vishwamitra, Elizabeth Whittaker, Joseph P Mazer, Robin Kowalski, Hongxin Hu, Feng Luo, Jamie Macbeth, and Edward Dillon. 2016. Cyberbullying Detection with a Pronunciation Based Convolutional Neural Network. In 15th IEEE International Conference onMachine Learning and Applications (ICMLA). 740-745.Google ScholarGoogle ScholarCross RefCross Ref
  44. Rui Zhao and Kezhi Mao. 2017. Cyberbullying Detection based on Semantic-Enhanced Marginalized Denoising Auto-Encoder. IEEE Transactions on Affective Computing 8, 3 (2017), 328-339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Haoti Zhong, Hao Li, Anna Cinzia Squicciarini, Sarah Michele Rajtmajer, Christopher Griffin, David J Miller, and Cornelia Caragea. 2016. Content-Driven Detection of Cyberbullying on the Instagram Social Network. In IJCAI. 3952-3958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Peng Zhou, Xuegang Hu, Peipei Li, and Xindong Wu. 2019. OFS-Density: A novel online streaming feature selection method. Pattern Recognition 86(2019), 48-61.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    WWW '19: The World Wide Web Conference
    May 2019
    3620 pages
    ISBN:9781450366748
    DOI:10.1145/3308558

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 May 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader