research-article

Detection and Cross-domain Evaluation of Cyberbullying in Facebook Activity Contents for Turkish

Authors:
Onder Coban

Adıyaman University, Adıyaman, TR, Turkey

Adıyaman University, Adıyaman, TR, Turkey

0000-0001-9404-2583
View Profile

,
Selma Ayse Ozel

Cukurova University, Adana, TR, Turkey

Cukurova University, Adana, TR, Turkey

0000-0001-9201-6349
View Profile

,
Ali Inan

Adana Alparslan Turkes Science and Technology University, Adana, TR, Turkey

Adana Alparslan Turkes Science and Technology University, Adana, TR, Turkey

0000-0002-3149-1565
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22 Issue 4Article No.: 114pp 1–32https://doi.org/10.1145/3580393

Published:25 March 2023Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Cyberbullying refers to bullying and harassment of defenseless or vulnerable people such as children, teenagers, and women through any means of communication (e.g., e-mail, text messages, wall posts, tweets) over any online medium (e.g., social media, blogs, online games, virtual reality environments). The effect of cyberbullying may be severe and irreversible and it has become one of the major problems of cyber-societies in today’s electronic world. Prevention of cyberbullying activities as well as the development of timely response mechanisms require automated and accurate detection of cyberbullying acts. This study focuses on the problem of cyberbullying detection over Facebook activity content written in Turkish. Through extensive experiments with the various machine and deep learning algorithms, the best estimator for the task is chosen and then employed for both cross-domain evaluation and profiling of cyber-aggressive users. The results obtained with fivefold cross-validation are evaluated with an average-macro F1 score. These results show that BERT is the best estimator with an average macro F1 of 0.928, and employing it on various datasets collected from different OSN domains produces highly satisfying results. This article also reports detailed profiling of cyber-aggressive users by providing even more information than what is visible to the naked eye.

REFERENCES

[1] Abadi Martín, Agarwal Ashish, Barham Paul, Brevdo Eugene, Chen Zhifeng, Citro Craig, Corrado Greg S., Davis Andy, Dean Jeffrey, Devin Matthieu, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Retrieved from https://arXiv:1603.04467.Google Scholar
[2] Agrawal Sweta and Awekar Amit. 2018. Deep learning for detecting cyberbullying across multiple social media platforms. In Proceedings of the European Conference on Information Retrieval. Springer, 141–153.Google ScholarCross Ref
[3] Ali Jehad, Khan Rehanullah, Ahmad Nasir, and Maqsood Imran. 2012. Random forests and decision trees. Int. J. Comput. Sci. Iss. 9, 5 (2012), 272.Google Scholar
[4] Ates Emre Cihan, Bostanci Erkan, and Guzel Mehmet Serdar. 2021. Comparative performance of machine learning algorithms in cyberbullying detection: Using turkish language preprocessing techniques. Retrieved from https://arXiv:2101.12718.Google Scholar
[5] Balakrishnan Vimala, Khan Shahzaib, and Arabnia Hamid R.. 2020. Improving cyberbullying detection using Twitter users’ psychological features and machine learning. Comput. Secur. 90 (2020), 101710.Google ScholarDigital Library
[6] Bengfort Benjamin and Bilbro Rebecca. 2019. Yellowbrick: Visualizing the scikit-learn model selection process. J. Open Source Softw. 4, 35 (2019), 1075.Google ScholarCross Ref
[7] Bozyiğit Alican, Utku Semih, and Nasiboğlu Efendi. 2019. Cyberbullying detection by using artificial neural network models. In Proceedings of the 4th International Conference on Computer Science and Engineering (UBMK’19). IEEE, 520–524.Google ScholarCross Ref
[8] Bozyiğit Alican, Utku Semih, and Nasibov Efendi. 2021. Cyberbullying detection: Utilizing social media features. Expert Systems with Applications 179 (2021), 115001.Google ScholarCross Ref
[9] Cliche Mathieu. 2017. BB_twtr at SemEval-2017 task 4: Twitter sentiment analysis with CNNs and LSTMs. Retrieved from https://arXiv:1704.06125.Google Scholar
[10] Coban Onder. 2022. IRText: An item response theory-based approach for text categorization. Arabian Journal for Science and Engineering 47, 8 (2022), 9423–9439.Google Scholar
[11] Çoban Önder. 2022. An assessment of nature-inspired algorithms for text feature selection. Computer Science 23, 2 (2022), 179–204.Google Scholar
[12] Coban Onder, Inan Ali, and Ozel Selma Ayse. 2020. Towards the design and implementation of an OSN crawler: A case of Turkish Facebook users. Int. J. Info. Secur. Sci. 9, 2 (2020), 76–93.Google Scholar
[13] Çoban Önder, İnan Ali, and Özel Selma Ayşe. 2021. Facebook tells me your gender: An exploratory study of gender prediction for Turkish Facebook users. Trans. Asian Low-Resour. Lang. Info. Process. 20, 4 (2021), 1–38.Google ScholarDigital Library
[14] Çoban Önder, Özel Selma Ayşe, and İnan Ali. 2021. Deep learning-based sentiment analysis of Facebook data: The case of Turkish users. Comput. J. 64, 3 (2021), 473–499.Google ScholarCross Ref
[15] Coban Onder, Ozyildirim Buse Melis, and Ozel Selma Ayse. 2018. An empirical study of the extreme learning machine for Twitter sentiment analysis. Int. J. Intell. Syst. Appl. Eng. 6, 3 (2018), 178–184.Google ScholarCross Ref
[16] Cohen Jacob. 1960. A coefficient of agreement for nominal scales. Edu. Psychol. Measure. 20, 1 (1960), 37–46.Google ScholarCross Ref
[17] Çöltekin Çağrı. 2020. A corpus of Turkish offensive language on social media. In Proceedings of the 12th Language Resources and Evaluation Conference. 6174–6184.Google Scholar
[18] Cortes Corinna and Vapnik Vladimir. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273–297.Google ScholarCross Ref
[19] Dağaşan Tuğba. 2019. Automatic hate speech detection on social media: Turkish tweets as an example. Ph. D. Dissertation. Ankara Yıldırım Beyazıt Üniversitesi Fen Bilimleri Enstitüsü.Google Scholar
[20] Department Statista Research. 2022. The Number of Worldwide Social Network Users. Retrieved from https://www.statista.com/statistics/278414/number-of-worldwidesocial-network-users/.Google Scholar
[21] Derczynski Leon. 2016. Complementarity, F-score, and NLP evaluation. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). 261–266.Google Scholar
[22] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Retrieved from https://arXiv:1810.04805.Google Scholar
[23] Emmery Chris, Verhoeven Ben, Pauw Guy De, Jacobs Gilles, Hee Cynthia Van, Lefever Els, Desmet Bart, Hoste Veronique, and Daelemans Walter. 2021. Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity. Lang. Resour. Eval. 55, 3 (2021), 597–633.Google ScholarDigital Library
[24] François Chollet et al. 2015. Keras: The Python deep learning library. Retrieved from https://keras.io/.Google Scholar
[25] Hussein Omar, Sfar Hachem, Mitrović Jelena, and Granitzer Michael. 2020. NLP_Passau at SemEval-2020 task 12: Multilingual neural network for offensive language detection in English, Danish, and Turkish. In Proceedings of the Fourteenth Workshop on Semantic Evaluation. 2090–2097.Google ScholarCross Ref
[26] Iwendi Celestine, Srivastava Gautam, Khan Suleman, and Maddikunta Praveen Kumar Reddy. 2020. Cyberbullying detection solutions based on deep learning architectures. Multimedia Syst. (2020), 1–14.Google Scholar
[27] Karayiğit Habibe, Acı Çiğdem İnan, and Akdağlı Ali. 2021. Detecting abusive Instagram comments in Turkish using convolutional Neural network and machine learning methods. Expert Syst. Appl. 174 (2021), 114802.Google ScholarDigital Library
[28] Khan Asif Ahmad and Bhat Aruna. 2022. A study on automatic detection of cyberbullying using machine learning. In Proceedings of the 6th International Conference on Intelligent Computing and Control Systems (ICICCS’22). IEEE, 1167–1174.Google ScholarCross Ref
[29] Krippendorff Klaus. 2018. Content Analysis: An Introduction to its Methodology. Sage Publications.Google Scholar
[30] Kumari Kirti, Singh Jyoti Prakash, Dwivedi Yogesh K., and Rana Nripendra P.. 2021. Multi-modal aggression identification using convolutional neural network and binary particle swarm optimization. Future Gen. Comput. Syst. 118 (2021), 187–197.Google ScholarCross Ref
[31] Lan Man, Tan Chew Lim, Su Jian, and Lu Yue. 2008. Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31, 4 (2008), 721–735.Google ScholarDigital Library
[32] Le Quoc and Mikolov Tomas. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. PMLR, 1188–1196.Google Scholar
[33] Lee Jinhyuk, Yoon Wonjin, Kim Sungdong, Kim Donghyeon, Kim Sunkyu, So Chan Ho, and Kang Jaewoo. 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.Google ScholarCross Ref
[34] Lipton Zachary C., Berkowitz John, and Elkan Charles. 2015. A critical review of recurrent neural networks for sequence learning. Retrieved from https://arXiv:1506.00019.Google Scholar
[35] Mayda İslam, Diri Banu, and Yıldız Tuğba. 2021. Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti. Avrupa Bilim ve Teknoloji Dergisi24 (2021), 328–334.Google Scholar
[36] McCallum Andrew, Nigam Kamal, et al. 1998. A comparison of event models for naive bayes text classification. In Proceedings of the AAAI Workshop on Learning for Text Categorization (AAAI’98), Vol. 752. Citeseer, 41–48.Google Scholar
[37] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. Retrieved from https://arXiv:1301.3781.Google Scholar
[38] Mladenović Miljana, Ošmjanski Vera, and Stanković Staša Vujičić. 2021. Cyber-aggression, cyberbullying, and cyber-grooming: A survey and research challenges. ACM Comput. Surv. 54, 1 (2021), 1–42.Google ScholarDigital Library
[39] Muneer Amgad and Fati Suliman Mohamed. 2020. A comparative analysis of machine learning techniques for cyberbullying detection on Twitter. Future Internet 12, 11 (2020), 187.Google ScholarCross Ref
[40] Nand Parma, Perera Rivindu, and Kasture Abhijeet. 2016. “How Bullying is this Message?”: A psychometric thermometer for bullying. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16). 695–706.Google Scholar
[41] Fortuna P. and Nunes Sérgio. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR) 51, 4 (2018), 1–30. DOI:Google ScholarDigital Library
[42] Ön Elif Pınar and Yeniterzi Reyyan. 2020. Cyberbullying detection using deep learning and word embedding analysis. In Proceedings of the 28th Signal Processing and Communications Applications Conference (SIU’20). IEEE, 1–4.Google ScholarCross Ref
[43] Özberk Anil and Çiçekli İlyas. 2021. Offensive language detection in Turkish tweets with bert models. In Proceedings of the 6th International Conference on Computer Science and Engineering (UBMK’21). IEEE, 517–521.Google ScholarCross Ref
[44] Ozcift Akin, Kilinc Deniz, and Bozyigit Fatma. 2019. Application of grid search parameter optimized Bayesian logistic regression algorithm to detect cyberbullying in turkish microblog data. Acad. Platform J. Eng. Sci. 7, 3 (2019), 355–361.Google ScholarCross Ref
[45] Özel Selma Ayşe, Saraç Esra, Akdemir Seyran, and Aksu Hülya. 2017. Detection of cyberbullying on social media messages in Turkish. In Proceedings of the International Conference on Computer Science and Engineering (UBMK’17). IEEE, 366–370.Google ScholarCross Ref
[46] Paul Sayanta and Saha Sriparna. 2022. CyberBERT: BERT for cyberbullying identification. Multimedia Systems 28 (2022), 1897–1904. Google ScholarDigital Library
[47] Paul Sayanta, Saha Sriparna, and Singh Jyoti Prakash. 2022. COVID-19 and cyberbullying: Deep ensemble model to identify cyberbullying from code-switched languages during the pandemic. Multimedia Tools Appl. (2022). Google ScholarDigital Library
[48] Pedregosa Fabian, Varoquaux Gaël, Gramfort Alexandre, Michel Vincent, Thirion Bertrand, Grisel Olivier, Blondel Mathieu, Prettenhofer Peter, Weiss Ron, Dubourg Vincent, et al. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.Google ScholarDigital Library
[49] Peterson Leif E.. 2009. K-nearest neighbor. Scholarpedia 4, 2 (2009), 1883.Google ScholarCross Ref
[50] Rehurek Radim, Sojka Petr, et al. 2011. Gensim—Statistical semantics in python. Retrieved from https://genism.org.Google Scholar
[51] Rysbek Darkhan. 2019. Sentiment analysis with recurrent neural networks on turkish reviews domain. Master’s Thesis. Middle East Technical University.Google Scholar
[52] Şahi Havvanur, Kılıç Yasemin, and Sağlam Rahime Belen. 2018. Automated detection of hate speech towards woman on Twitter. In Proceedings of the 3rd International Conference on Computer Science and Engineering (UBMK’18). IEEE, 533–536.Google ScholarCross Ref
[53] Salawu Semiu, He Yulan, and Lumsden Joanna. 2017. Approaches to automated detection of cyberbullying: A survey. IEEE Trans. Affect. Comput. 11, 1 (2017), 3–24.Google ScholarCross Ref
[54] Saleem Haji Mohammad, Dillon Kelly P., Benesch Susan, and Ruths Derek. 2017. A web of hate: Tackling hateful speech in online social spaces. Retrieved from https://arXiv:1709.10159.Google Scholar
[55] Samarati Pierangela and Sweeney Latanya. 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression (Technical Report). SRI International. https://dataprivacylab.org/dataprivacy/projects/kanonymity/paper3.pdf.Google Scholar
[56] Schmidt Anna and Wiegand Michael. 2019. A survey on hate speech detection using natural language processing. In Proceedings of the 5th International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, 1–10.Google Scholar
[57] Shah Kanish, Patel Henil, Sanghvi Devanshi, and Shah Manan. 2020. A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augm. Hum. Res. 5, 1 (2020), 1–16.Google ScholarCross Ref
[58] Singh Ankur. 2020. End-to-end Masked Language Modeling with BERT. Retrieved from https://keras.io/examples/nlp/masked_language_modeling/.Google Scholar
[59] Steven Bird and Edward Loper. 2004. Nltk: The natural language toolkit In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions. In Proceedings of the Association for Computational Linguistics. 31.Google Scholar
[60] Tocoglu Mansur Alp and Alpkocak Adil. 2018. TREMO: A dataset for emotion analysis in Turkish. J. Info. Sci. 44, 6 (2018), 848–860.Google ScholarDigital Library
[61] Tulyakov Sergey, Jaeger Stefan, Govindaraju Venu, and Doermann David. 2008. Review of classifier combination methods. Mach. Learn. Doc. Anal. Recogn. (2008), 361–386.Google Scholar
[62] Uysal Alper Kursat and Murphey Yi Lu. 2017. Sentiment classification: Feature selection based approaches versus deep learning. In Proceedings of the IEEE International Conference on Computer and Information Technology (CIT’17). IEEE, 23–30.Google ScholarCross Ref
[63] Uzan Moshe and HaCohen-Kerner Yaakov. 2020. JCT at SemEval-2020 task 12: Offensive language detection in tweets using preprocessing methods, character and word n-grams. In Proceedings of the 14th Workshop on Semantic Evaluation. 2017–2022.Google ScholarCross Ref
[64] Royen Kathleen Van, Poels Karolien, Daelemans Walter, and Vandebosch Heidi. 2015. Automatic monitoring of cyberbullying on social networking sites: From technological feasibility to desirability. Telemat. Informat. 32, 1 (2015), 89–97.Google ScholarCross Ref
[65] Wang Bin, Wang Angela, Chen Fenxiao, Wang Yuncheng, and Kuo C.-C. Jay. 2019. Evaluating word embedding models: Methods and experimental results. APSIPA Trans. Signal Info. Process. 8 (2019).Google Scholar
[66] Waseem Zeerak and Hovy Dirk. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. 88–93.Google ScholarCross Ref
[67] Yang Yiming and Pedersen Jan O.. 1997. A comparative study on feature selection in text categorization. In Proceedings of the International Conference on Machine Learning (ICML’97). 35.Google Scholar
[68] Zhang Ye and Wallace Byron. 2015. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. Retrieved from https://arXiv:1510.03820.Google Scholar
[69] Çoban Önder, İnan Ali, and Özel Selma Ayşe. 2022. Think before you share in OSNs: Textual content and connection weight put you at higher privacy risk. Int. J. Info. Secur. Sci. 11, 2 (2022), 25–51.Google Scholar

Index Terms

Detection and Cross-domain Evaluation of Cyberbullying in Facebook Activity Contents for Turkish

Recommendations

Parental mediation, cyberbullying, and cybertrolling

Researchers are concerned with identifying the risk and protective factors associated with adolescents' involvement in cyberharassment. One such factor is parental mediation of children's electronic technology use. Little attention has been given to how ...
Read More
Pathological narcissism, cyberbullying victimization and offending among homosexual and heterosexual participants in online dating websites

Homosexual individuals are exposed to high levels of victimization. However, no studies have examined personality risk factors for cyberbullying victimization and offending among this population. This study investigated the relationships between ...
Read More
Cyberbullying

People say and do things with the capacity to hurt or emotionally injure each other, and as individuals continue to interact with each other in mediated contexts, these incidents increasingly occur online. The present study contributes to the growing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22, Issue 4
April 2023
682 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3588902
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 March 2023
- Online AM: 17 January 2023
- Accepted: 30 December 2022
- Revised: 21 November 2022
- Received: 25 February 2022
Published in tallip Volume 22, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Facebook
cyber-aggression
cyberbullying
online social networks
machine learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 408
  Total Downloads
- Downloads (Last 12 months)278
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Detection and Cross-domain Evaluation of Cyberbullying in Facebook Activity Contents for Turkish

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Parental mediation, cyberbullying, and cybertrolling

Pathological narcissism, cyberbullying victimization and offending among homosexual and heterosexual participants in online dating websites

Cyberbullying

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

Detection and Cross-domain Evaluation of Cyberbullying in Facebook Activity Contents for Turkish

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Parental mediation, cyberbullying, and cybertrolling

Pathological narcissism, cyberbullying victimization and offending among homosexual and heterosexual participants in online dating websites

Cyberbullying

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media