Abstract
Cyberbullying refers to bullying and harassment of defenseless or vulnerable people such as children, teenagers, and women through any means of communication (e.g., e-mail, text messages, wall posts, tweets) over any online medium (e.g., social media, blogs, online games, virtual reality environments). The effect of cyberbullying may be severe and irreversible and it has become one of the major problems of cyber-societies in today’s electronic world. Prevention of cyberbullying activities as well as the development of timely response mechanisms require automated and accurate detection of cyberbullying acts. This study focuses on the problem of cyberbullying detection over Facebook activity content written in Turkish. Through extensive experiments with the various machine and deep learning algorithms, the best estimator for the task is chosen and then employed for both cross-domain evaluation and profiling of cyber-aggressive users. The results obtained with fivefold cross-validation are evaluated with an average-macro F1 score. These results show that BERT is the best estimator with an average macro F1 of 0.928, and employing it on various datasets collected from different OSN domains produces highly satisfying results. This article also reports detailed profiling of cyber-aggressive users by providing even more information than what is visible to the naked eye.
- [1] . 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Retrieved from https://arXiv:1603.04467.Google Scholar
- [2] . 2018. Deep learning for detecting cyberbullying across multiple social media platforms. In Proceedings of the European Conference on Information Retrieval. Springer, 141–153.Google ScholarCross Ref
- [3] . 2012. Random forests and decision trees. Int. J. Comput. Sci. Iss. 9, 5 (2012), 272.Google Scholar
- [4] . 2021. Comparative performance of machine learning algorithms in cyberbullying detection: Using turkish language preprocessing techniques. Retrieved from https://arXiv:2101.12718.Google Scholar
- [5] . 2020. Improving cyberbullying detection using Twitter users’ psychological features and machine learning. Comput. Secur. 90 (2020), 101710.Google ScholarDigital Library
- [6] . 2019. Yellowbrick: Visualizing the scikit-learn model selection process. J. Open Source Softw. 4, 35 (2019), 1075.Google ScholarCross Ref
- [7] . 2019. Cyberbullying detection by using artificial neural network models. In Proceedings of the 4th International Conference on Computer Science and Engineering (UBMK’19). IEEE, 520–524.Google ScholarCross Ref
- [8] . 2021. Cyberbullying detection: Utilizing social media features. Expert Systems with Applications 179 (2021), 115001.Google ScholarCross Ref
- [9] . 2017. BB_twtr at SemEval-2017 task 4: Twitter sentiment analysis with CNNs and LSTMs. Retrieved from https://arXiv:1704.06125.Google Scholar
- [10] . 2022. IRText: An item response theory-based approach for text categorization. Arabian Journal for Science and Engineering 47, 8 (2022), 9423–9439.Google Scholar
- [11] . 2022. An assessment of nature-inspired algorithms for text feature selection. Computer Science 23, 2 (2022), 179–204.Google Scholar
- [12] . 2020. Towards the design and implementation of an OSN crawler: A case of Turkish Facebook users. Int. J. Info. Secur. Sci. 9, 2 (2020), 76–93.Google Scholar
- [13] . 2021. Facebook tells me your gender: An exploratory study of gender prediction for Turkish Facebook users. Trans. Asian Low-Resour. Lang. Info. Process. 20, 4 (2021), 1–38.Google ScholarDigital Library
- [14] . 2021. Deep learning-based sentiment analysis of Facebook data: The case of Turkish users. Comput. J. 64, 3 (2021), 473–499.Google ScholarCross Ref
- [15] . 2018. An empirical study of the extreme learning machine for Twitter sentiment analysis. Int. J. Intell. Syst. Appl. Eng. 6, 3 (2018), 178–184.Google ScholarCross Ref
- [16] . 1960. A coefficient of agreement for nominal scales. Edu. Psychol. Measure. 20, 1 (1960), 37–46.Google ScholarCross Ref
- [17] . 2020. A corpus of Turkish offensive language on social media. In Proceedings of the 12th Language Resources and Evaluation Conference. 6174–6184.Google Scholar
- [18] . 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273–297.Google ScholarCross Ref
- [19] . 2019. Automatic hate speech detection on social media: Turkish tweets as an example. Ph. D. Dissertation. Ankara Yıldırım Beyazıt Üniversitesi Fen Bilimleri Enstitüsü.Google Scholar
- [20] . 2022. The Number of Worldwide Social Network Users. Retrieved from https://www.statista.com/statistics/278414/number-of-worldwidesocial-network-users/.Google Scholar
- [21] . 2016. Complementarity, F-score, and NLP evaluation. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). 261–266.Google Scholar
- [22] . 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Retrieved from https://arXiv:1810.04805.Google Scholar
- [23] . 2021. Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity. Lang. Resour. Eval. 55, 3 (2021), 597–633.Google ScholarDigital Library
- [24] . 2015. Keras: The Python deep learning library. Retrieved from https://keras.io/.Google Scholar
- [25] . 2020. NLP_Passau at SemEval-2020 task 12: Multilingual neural network for offensive language detection in English, Danish, and Turkish. In Proceedings of the Fourteenth Workshop on Semantic Evaluation. 2090–2097.Google ScholarCross Ref
- [26] . 2020. Cyberbullying detection solutions based on deep learning architectures. Multimedia Syst. (2020), 1–14.Google Scholar
- [27] . 2021. Detecting abusive Instagram comments in Turkish using convolutional Neural network and machine learning methods. Expert Syst. Appl. 174 (2021), 114802.Google ScholarDigital Library
- [28] . 2022. A study on automatic detection of cyberbullying using machine learning. In Proceedings of the 6th International Conference on Intelligent Computing and Control Systems (ICICCS’22). IEEE, 1167–1174.Google ScholarCross Ref
- [29] . 2018. Content Analysis: An Introduction to its Methodology. Sage Publications.Google Scholar
- [30] . 2021. Multi-modal aggression identification using convolutional neural network and binary particle swarm optimization. Future Gen. Comput. Syst. 118 (2021), 187–197.Google ScholarCross Ref
- [31] . 2008. Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31, 4 (2008), 721–735.Google ScholarDigital Library
- [32] . 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. PMLR, 1188–1196.Google Scholar
- [33] . 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.Google ScholarCross Ref
- [34] . 2015. A critical review of recurrent neural networks for sequence learning. Retrieved from https://arXiv:1506.00019.Google Scholar
- [35] . 2021. Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti. Avrupa Bilim ve Teknoloji Dergisi24 (2021), 328–334.Google Scholar
- [36] . 1998. A comparison of event models for naive bayes text classification. In Proceedings of the AAAI Workshop on Learning for Text Categorization (AAAI’98), Vol. 752. Citeseer, 41–48.Google Scholar
- [37] . 2013. Efficient estimation of word representations in vector space. Retrieved from https://arXiv:1301.3781.Google Scholar
- [38] . 2021. Cyber-aggression, cyberbullying, and cyber-grooming: A survey and research challenges. ACM Comput. Surv. 54, 1 (2021), 1–42.Google ScholarDigital Library
- [39] . 2020. A comparative analysis of machine learning techniques for cyberbullying detection on Twitter. Future Internet 12, 11 (2020), 187.Google ScholarCross Ref
- [40] . 2016. “How Bullying is this Message?”: A psychometric thermometer for bullying. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16). 695–706.Google Scholar
- [41] . 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR) 51, 4 (2018), 1–30.
DOI: Google ScholarDigital Library - [42] . 2020. Cyberbullying detection using deep learning and word embedding analysis. In Proceedings of the 28th Signal Processing and Communications Applications Conference (SIU’20). IEEE, 1–4.Google ScholarCross Ref
- [43] . 2021. Offensive language detection in Turkish tweets with bert models. In Proceedings of the 6th International Conference on Computer Science and Engineering (UBMK’21). IEEE, 517–521.Google ScholarCross Ref
- [44] . 2019. Application of grid search parameter optimized Bayesian logistic regression algorithm to detect cyberbullying in turkish microblog data. Acad. Platform J. Eng. Sci. 7, 3 (2019), 355–361.Google ScholarCross Ref
- [45] . 2017. Detection of cyberbullying on social media messages in Turkish. In Proceedings of the International Conference on Computer Science and Engineering (UBMK’17). IEEE, 366–370.Google ScholarCross Ref
- [46] . 2022. CyberBERT: BERT for cyberbullying identification. Multimedia Systems 28 (2022), 1897–1904. Google ScholarDigital Library
- [47] . 2022. COVID-19 and cyberbullying: Deep ensemble model to identify cyberbullying from code-switched languages during the pandemic. Multimedia Tools Appl. (2022). Google ScholarDigital Library
- [48] . 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.Google ScholarDigital Library
- [49] . 2009. K-nearest neighbor. Scholarpedia 4, 2 (2009), 1883.Google ScholarCross Ref
- [50] . 2011. Gensim—Statistical semantics in python. Retrieved from https://genism.org.Google Scholar
- [51] . 2019. Sentiment analysis with recurrent neural networks on turkish reviews domain. Master’s Thesis. Middle East Technical University.Google Scholar
- [52] . 2018. Automated detection of hate speech towards woman on Twitter. In Proceedings of the 3rd International Conference on Computer Science and Engineering (UBMK’18). IEEE, 533–536.Google ScholarCross Ref
- [53] . 2017. Approaches to automated detection of cyberbullying: A survey. IEEE Trans. Affect. Comput. 11, 1 (2017), 3–24.Google ScholarCross Ref
- [54] . 2017. A web of hate: Tackling hateful speech in online social spaces. Retrieved from https://arXiv:1709.10159.Google Scholar
- [55] . 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression (Technical Report). SRI International. https://dataprivacylab.org/dataprivacy/projects/kanonymity/paper3.pdf.Google Scholar
- [56] . 2019. A survey on hate speech detection using natural language processing. In Proceedings of the 5th International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, 1–10.Google Scholar
- [57] . 2020. A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augm. Hum. Res. 5, 1 (2020), 1–16.Google ScholarCross Ref
- [58] . 2020. End-to-end Masked Language Modeling with BERT. Retrieved from https://keras.io/examples/nlp/masked_language_modeling/.Google Scholar
- [59] . 2004. Nltk: The natural language toolkit In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions. In Proceedings of the Association for Computational Linguistics. 31.Google Scholar
- [60] . 2018. TREMO: A dataset for emotion analysis in Turkish. J. Info. Sci. 44, 6 (2018), 848–860.Google ScholarDigital Library
- [61] . 2008. Review of classifier combination methods. Mach. Learn. Doc. Anal. Recogn. (2008), 361–386.Google Scholar
- [62] . 2017. Sentiment classification: Feature selection based approaches versus deep learning. In Proceedings of the IEEE International Conference on Computer and Information Technology (CIT’17). IEEE, 23–30.Google ScholarCross Ref
- [63] . 2020. JCT at SemEval-2020 task 12: Offensive language detection in tweets using preprocessing methods, character and word n-grams. In Proceedings of the 14th Workshop on Semantic Evaluation. 2017–2022.Google ScholarCross Ref
- [64] . 2015. Automatic monitoring of cyberbullying on social networking sites: From technological feasibility to desirability. Telemat. Informat. 32, 1 (2015), 89–97.Google ScholarCross Ref
- [65] . 2019. Evaluating word embedding models: Methods and experimental results. APSIPA Trans. Signal Info. Process. 8 (2019).Google Scholar
- [66] . 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. 88–93.Google ScholarCross Ref
- [67] . 1997. A comparative study on feature selection in text categorization. In Proceedings of the International Conference on Machine Learning (ICML’97). 35.Google Scholar
- [68] . 2015. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. Retrieved from https://arXiv:1510.03820.Google Scholar
- [69] . 2022. Think before you share in OSNs: Textual content and connection weight put you at higher privacy risk. Int. J. Info. Secur. Sci. 11, 2 (2022), 25–51.Google Scholar
Index Terms
- Detection and Cross-domain Evaluation of Cyberbullying in Facebook Activity Contents for Turkish
Recommendations
Parental mediation, cyberbullying, and cybertrolling
Researchers are concerned with identifying the risk and protective factors associated with adolescents' involvement in cyberharassment. One such factor is parental mediation of children's electronic technology use. Little attention has been given to how ...
Pathological narcissism, cyberbullying victimization and offending among homosexual and heterosexual participants in online dating websites
Homosexual individuals are exposed to high levels of victimization. However, no studies have examined personality risk factors for cyberbullying victimization and offending among this population. This study investigated the relationships between ...
Cyberbullying
People say and do things with the capacity to hurt or emotionally injure each other, and as individuals continue to interact with each other in mediated contexts, these incidents increasingly occur online. The present study contributes to the growing ...
Comments