ABSTRACT
The potentially detrimental effects of cyberbullying have led to the development of numerous automated, data-driven approaches, with emphasis on classification accuracy. Cyberbullying, as a form of abusive online behavior, although not well-defined, is a repetitive process, i.e., a sequence of aggressive messages sent from a bully to a victim over a period of time with the intent to harm the victim. Existing work has focused on harassment (i.e., using profanity to classify toxic comments independently) as an indicator of cyberbullying, disregarding the repetitive nature of this harassing process. However, raising a cyberbullying alert immediately after an aggressive comment is detected can lead to a high number of false positives. At the same time, two key practical challenges remain unaddressed: (i) detection timeliness, which is necessary to support victims as early as possible, and (ii) scalability to the staggering rates at which content is generated in online social networks.
In this work, we introduce CONcISE, a novel approach for timely and accurate Cyberbullying detectiON on Instagram media SEssions. We propose a sequential hypothesis testing formulation that seeks to drastically reduce the number of features used in classifying each comment while maintaining high classification accuracy. CONcISE raises an alert only after a certain number of detections have been made. Extensive experiments on a real-world Instagram dataset with ~ 4M users and ~ 10M comments demonstrate the effectiveness, scalability, and timeliness of our approach and its benefits over existing methods.
- Mohammed Ali Al-garadi, Kasturi Dewi Varathan, and Sri Devi Ravana. 2016. Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network. Computers in Human Behavior 63 (2016), 433-443. Google ScholarDigital Library
- D. P. Bertsekas. 2005. Dynamic Programming and Optimal Control. Vol. 1. Athena Scientific.Google ScholarDigital Library
- Michael Buckland and Fredric Gey. 1994. The relationship between recall and precision. Journal of the American society for information science 45, 1(1994), 12-19. Google ScholarDigital Library
- Jiuwen Cao, Tao Chen, and Jiayuan Fan. 2014. Fast online learning algorithm for landmark recognition based on BoW framework. In Industrial Electronics and Applications (ICIEA), 2014 IEEE 9th Conference on. IEEE, 1163-1168.Google ScholarCross Ref
- Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Detecting Aggressors and Bullies on Twitter. In Proceedings of the 26th International Conference on World Wide Web Companion. 767-768. Google ScholarDigital Library
- Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on Web Science Conference. ACM, 13-22. Google ScholarDigital Library
- Vivek Singh Devin Soni. {n. d.}. Time Reveals AllWounds: Modeling Temporal Dynamics of Cyberbullying Sessions. In Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM 2018).Google Scholar
- AllSlang family. {n. d.}. Internet Slang Swear Word List & Curse Filter. https://www.noswearing.com/dictionary.Google Scholar
- Sujatha Das Gollapalli, Cornelia Caragea, Prasenjit Mitra, and C Lee Giles. 2013. Researcher homepage classification using unlabeled data. In Proceedings of the 22nd international conference on World Wide Web. ACM, 471-482. Google ScholarDigital Library
- Leam Hackett. 2017. The Annual Bullying Survey 2017. https://www.ditchthelabel.org/wp-content/uploads/2017/07/The-Annual-Bullying-Survey-2017-1.pdf. (accessed on Aug. 30 2018).Google Scholar
- M. A. Hall. 1999. Correlation-based feature selection for machine learning. Ph.D. Dissertation. The University of Waikato.Google Scholar
- Sameer Hinduja and Justin W Patchin. 2007. Offline consequences of online victimization: School violence and delinquency. Journal of school violence 6, 3 (2007), 89-112.Google ScholarCross Ref
- Dianne L Hoff and Sidney N Mitchell. 2009. Cyberbullying: Causes, effects, and remedies. Journal of Educational Administration 47, 5 (2009), 652-665.Google ScholarCross Ref
- Steven CH Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. 2018. Online Learning: A Comprehensive Survey. arXiv preprint arXiv:1802.02871(2018).Google Scholar
- Steven CH Hoi, Jialei Wang, Peilin Zhao, and Rong Jin. 2012. Online feature selection for mining big data. In Proceedings of the 1st international workshop on big data, streams and heterogeneous source mining: Algorithms, systems, programming models and applications. ACM, 93-100. Google ScholarDigital Library
- Homa Hosseinmardi, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2016. Prediction of cyberbullying incidents in a media-based social network. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 186-192. Google ScholarDigital Library
- Guichun Hua, Min Zhang, Yiqun Liu, Shaoping Ma, and Liyun Ru. 2010. Hierarchical feature selection for ranking. In Proceedings of the 19th international conference on world wide web. ACM, 1113-1114. Google ScholarDigital Library
- Hao Huang, Shinjae Yoo, and Shiva Prasad Kasiviswanathan. 2015. Unsupervised feature selection on data streams. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1031-1040. Google ScholarDigital Library
- Jyrki Kivinen, Alexander J Smola, and Robert C Williamson. 2004. Online learning with kernels. IEEE transactions on signal processing 52, 8 (2004), 2165-2176. Google ScholarDigital Library
- Robin M Kowalski and Susan P Limber. 2013. Psychological, physical, and academic correlates of cyberbullying and traditional bullying. Journal of Adolescent Health 53, 1 (2013), S13-S20.Google ScholarCross Ref
- Haiguang Li, Xindong Wu, Zhao Li, and Wei Ding. 2013. Group feature selection with streaming features. In Data Mining (ICDM), 2013 IEEE 13th International Conference on. IEEE, 1109-1114.Google ScholarCross Ref
- Jiguang Liang, Xiaofei Zhou, Li Guo, and Shuo Bai. 2015. Feature selection for sentiment classification using matrix factorization. In Proceedings of the 24th International Conference on World Wide Web. ACM, 63-64. Google ScholarDigital Library
- T. Marill and D. Green. 1963. On the effectiveness of receptors in recognition systems. IEEE Transactions on Information Theory 9, 1 (1963), 11-17. Google ScholarDigital Library
- Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, 145-153. Google ScholarDigital Library
- Guansong Pang, Longbing Cao, Ling Chen, and Huan Liu. 2016. Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 410-419.Google ScholarCross Ref
- Simon Perkins and James Theiler. 2003. Online feature selection using grafting. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). 592-599. Google ScholarDigital Library
- Rahat Ibn Rafiq, Homa Hosseinmardi, Richard Han, Qin Lv, and Shivakant Mishra. 2018. Scalable and timely detection of cyberbullying in online social networks. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing. ACM, 1738-1747. Google ScholarDigital Library
- Rahat Ibn Rafiq, Homa Hosseinmardi, Richard Han, Qin Lv, Shivakant Mishra, and Sabrina Arredondo Mattson. 2015. Careful what you share in six seconds: Detecting cyberbullying instances in Vine. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. ACM, 617-622. Google ScholarDigital Library
- Elaheh Raisi and Bert Huang. 2017. Cyberbullying detection with weakly supervised machine learning. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. ACM, 409-416. Google ScholarDigital Library
- Elaheh Raisi and Bert Huang. 2018. Weakly Supervised Cyberbullying Detection Using Co-Trained Ensembles of Embedding Models. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 479-486.Google Scholar
- Weixiang Shao, Lifang He, Chun-Ta Lu, Xiaokai Wei, and S Yu Philip. 2016. Online unsupervised multi-view feature selection. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 1203-1208.Google ScholarCross Ref
- Albert N Shiryaev. 2007. Optimal Stopping Rules. Vol. 8. Springer Science & Business Media.Google Scholar
- Mifta Sintaha, Shahed Bin Satter, Niamat Zawad, Chaity Swarnaker, and Ahanaf Hassan. 2016. Cyberbullying detection using sentiment analysis in social media. Ph.D. Dissertation. BRAC University.Google Scholar
- Peter K Smith, Jess Mahdavi, Manuel Carvalho, and Neil Tippett. 2006. An investigation into cyberbullying, its forms, awareness and impact, and the relationship between age and gender in cyberbullying. Research Brief No. RBX03-06. London: DfES(2006).Google Scholar
- Luis von Ahn. {n. d.}. Offensive/Profane Word List. https://www.cs.cmu.edu/~biglou/resources/bad-words.txt.Google Scholar
- Jialei Wang, Peilin Zhao, and Steven CH Hoi. 2016. Soft confidence-weighted learning. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1(2016), 15. Google ScholarDigital Library
- Jialei Wang, Peilin Zhao, Steven CH Hoi, and Rong Jin. 2014. Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering 26, 3(2014), 698-710. Google ScholarDigital Library
- Xindong Wu, Kui Yu, Hao Wang, and Wei Ding. 2010. Online streaming feature selection. In Proceedings of the 27th international conference on machine learning (ICML-10). Citeseer, 1159-1166. Google ScholarDigital Library
- Mengfan Yao, Charalampos Chelmis, and Daphney-Stavroula Zois. 2018. Cyberbullying Detection on Instagram with Optimal Online Feature Selection. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 401-408.Google Scholar
- Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2016. Scalable and accurate online feature selection for big data. ACM Transactions on Knowledge Discovery from Data (TKDD) 11, 2(2016), 16. Google ScholarDigital Library
- Aonan Zhang, Jun Zhu, and Bo Zhang. 2013. Sparse online topic models. In Proceedings of the 22nd international conference on World Wide Web. ACM, 1489-1500. Google ScholarDigital Library
- Liang Zhang, Jie Yang, and Belle Tseng. 2012. Online modeling of proactive moderation system for auction fraud detection. In Proceedings of the 21st international conference on World Wide Web. ACM, 669-678. Google ScholarDigital Library
- Xiang Zhang, Jonathan Tong, Nishant Vishwamitra, Elizabeth Whittaker, Joseph P Mazer, Robin Kowalski, Hongxin Hu, Feng Luo, Jamie Macbeth, and Edward Dillon. 2016. Cyberbullying Detection with a Pronunciation Based Convolutional Neural Network. In 15th IEEE International Conference onMachine Learning and Applications (ICMLA). 740-745.Google ScholarCross Ref
- Rui Zhao and Kezhi Mao. 2017. Cyberbullying Detection based on Semantic-Enhanced Marginalized Denoising Auto-Encoder. IEEE Transactions on Affective Computing 8, 3 (2017), 328-339.Google ScholarDigital Library
- Haoti Zhong, Hao Li, Anna Cinzia Squicciarini, Sarah Michele Rajtmajer, Christopher Griffin, David J Miller, and Cornelia Caragea. 2016. Content-Driven Detection of Cyberbullying on the Instagram Social Network. In IJCAI. 3952-3958. Google ScholarDigital Library
- Peng Zhou, Xuegang Hu, Peipei Li, and Xindong Wu. 2019. OFS-Density: A novel online streaming feature selection method. Pattern Recognition 86(2019), 48-61.Google ScholarCross Ref
Recommendations
Robust Detection of Cyberbullying in Social Media
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceThe potentially detrimental effects of cyberbullying have led to the development of numerous automated, data–driven approaches, with an emphasis on classification accuracy. Cyberbullying, as a form of abusive online behavior, although not well–defined, ...
Lightning Talk–Towards Robust Detection of Cyberbullying in Social Media
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceThe potentially detrimental effects of cyberbullying have led to the development of numerous automated, data–driven approaches, with emphasis on classification accuracy. Cyberbullying, as a form of abusive online behavior, although not well–defined, is ...
Dynamic, Incremental, and Continuous Detection of Cyberbullying in Online Social Media
The potentially detrimental effects of cyberbullying have led to the development of numerous automated, data-driven approaches, with emphasis on classification accuracy. Cyberbullying, as a form of abusive online behavior, although not well-defined, is a ...
Comments