skip to main content
10.1145/3106426.3106456acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
research-article

Presenting a labelled dataset for real-time detection of abusive user posts

Published: 23 August 2017 Publication History

Abstract

Social media sites facilitate users in posting their own personal comments online. Most support free format user posting, with close to real-time publishing speeds. However, online posts generated by a public user audience carry the risk of containing inappropriate, potentially abusive content. To detect such content, the straightforward approach is to filter against blacklists of profane terms. However, this lexicon filtering approach is prone to problems around word variations and lack of context. Although recent methods inspired by machine learning have boosted detection accuracies, the lack of gold standard labelled datasets limits the development of this approach. In this work, we present a dataset of user comments, using crowdsourcing for labelling. Since abusive content can be ambiguous and subjective to the individual reader, we propose an aggregated mechanism for assessing different opinions from different labellers. In addition, instead of the typical binary categories of abusive or not, we introduce a third class of 'undecided' to capture the real life scenario of instances that are neither blatantly abusive nor clearly harmless. We have performed preliminary experiments on this dataset using best practice techniques in text classification. Finally, we have evaluated the detection performance of various feature groups, namely syntactic, semantic and context-based features. Results show these features can increase our classifier performance by 18% in detection of abusive content.

References

[1]
Jennifer Bayzick, April Kontostathis, and Lynne Edwards. 2011. Detecting the presence of cyberbullying using computer software. WebSci Conferemce (2011).
[2]
Pete Burnap and Matthew L Williams. 2015. Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & Internet 7, 2 (2015), 223--242.
[3]
Hao Chen, Susan Mckeever, and Sarah Jane Delany. 2016. Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media. In Advances in Computational Intelligence Systems: Contributions Presented at the 16th UK Workshop on Computational Intelligence, 2016, Vol. 513. Springer, Springer, Lancaster, UK, 187.
[4]
Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2012. Detecting Offensive Language in Social Media to Protect Adolescent Online Safety. In Proceedings of the 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust (SOCIALCOM-PASSAT '12). IEEE Computer Society, Washington, DC, USA, 71-80.
[5]
Maral Dadvar and Franciska de Jong. 2012. Cyberbullying Detection: A Step Toward a Safer Internet Yard. In Proceedings of the 21st International Conference on World Wide Web (WWW '12 Companion). ACM, New York, NY, USA, 121-126.
[6]
M Dadvar, FMG de Jong, RJF Ordelman, and RB Trieschnigg. 2012. Improved cyberbullying detection using gender information. In Proceedings of the Twelfth Dutch-Belgian Information Retrieval Workshop (DIR 2012). University of Ghent, St. Pietersnieuwstraat 33, 9000 Gent, Belgium, 23--26.
[7]
Maral Dadvar, Dolf Trieschnigg, and Franciska de Jong. 2014. Experts and machines against bullies: a hybrid approach to detect cyberbullies. In Canadian Conference on Artificial Intelligence. Springer, 275--281.
[8]
Laura P Del Bosque and Sara Elena Garza. 2014. Aggressive text detection for cyberbullying. In Mexican International Conference on Artificial Intelligence. Springer, 221--232.
[9]
Karthik Dinakar, Roi Reichart, and Henry Lieberman. 2011. Modeling the Detection of Textual Cyberbullying. In The Social Mobile Web, Papers from the 2011 ICWSM Workshop (AAAI Workshops). AAAI, Barcelona, Catalonia, Spain. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/3841
[10]
Homa Hosseinmardi, Rahat Ibn Rafiq, Shaosong Li, Zhili Yang, Richard Han, Shivakant Mishra, and Qin Lv. 2014. A comparison of common users across instagram and ask. fm to better understand cyberbullying. arXiv preprint arXiv:1408.4882 (2014).
[11]
Qianjia Huang, Vivek Kumar Singh, and Pradeep Kumar Atrey. 2014. Cyber Bullying Detection Using Social and Textual Analysis. In Proceedings of the 3rd International Workshop on Socially-Aware Multimedia (SAM '14). ACM, New York, NY, USA, 3--6.
[12]
Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. Machine learning: ECML-98 (1998), 137--142.
[13]
April Kontostathis, Kelly Reynolds, Andy Garron, and Lynne Edwards. 2013. Detecting cyberbullying: query terms and techniques. In Proceedings of the 5th annual acm web science conference. ACM, 195--204.
[14]
A. Mangaonkar, A. Hayrapetian, and R. Raje. 2015. Collaborative detection of cyberbullying behavior in Twitter data. In 2015 IEEE International Conference on Electro/Information Technology (EIT). IEEE, Northern Illinois University Dekalb, IL, USA, 611--616.
[15]
Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 145--153.
[16]
S Patro and Kishore Kumar Sahu. 2015. Normalization: A Preprocessing Stage. arXiv preprint arXiv:1503.06462 (2015).
[17]
Kelly Reynolds, April Kontostathis, and Lynne Edwards. 2011. Using machine learning to detect cyberbullying. In Machine learning and applications and workshops (ICMLA), 2011 10th International Conference on, Vol. 2. IEEE, IEEE, Hilton Hawaiian Village, Honolulu Hawaii USA, 241--244.
[18]
Fabrizio Sebastiani. 2002. Machine Learning in Automated Text Categorization. ACM Comput. Surv. 34, 1 (March 2002), 1--47.
[19]
Sara Sood, Judd Antin, and Elizabeth Churchill. 2012. Profanity Use in Online Communities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). ACM, New York, NY, USA, 1481-1490.
[20]
Sara Owsley Sood, Elizabeth F Churchill, and Judd Antin. 2012. Automatic identification of personal insults on social news sites. Journal of the American Society for Information Science and Technology 63, 2 (2012), 270--285.
[21]
GuangXiang, Bin Fan, Ling Wang, Jason Hong, and Carolyn Rose. 2012. Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM '12). ACM, New York, NY, USA, 1980-1984.
[22]
Jun-Ming Xu, Kwang-Sung Jun, Xiaojin Zhu, and Amy Bellmore. 2012. Learning from bullying traces in social media. In Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies. Association for Computational Linguistics, 656--666.
[23]
Dawei Yin, Zhenzhen Xue, Liangjie Hong, Brian D Davison, April Kontostathis, and Lynne Edwards. 2009. Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB 2 (2009), 1--7.
[24]
Rui Zhao, Anna Zhou, and Kezhi Mao. 2016. Automatic Detection of Cyberbullying on Social Networks Based on Bullying Features. In Proceedings of the 17th International Conference on Distributed Computing and Networking (ICDCN '16). ACM, New York, NY, USA, Article 43, 6 pages.

Cited By

View all
  • (2023)Cyberbullying Detection on Twitter Using Deep Learning-Based Attention Mechanisms and Continuous Bag of Words Feature ExtractionMathematics10.3390/math1116356711:16(3567)Online publication date: 17-Aug-2023
  • (2022)It's All Relative! A Method to Counter Human Bias in Crowdsourced Stance Detection of News ArticlesProceedings of the ACM on Human-Computer Interaction10.1145/35556366:CSCW2(1-25)Online publication date: 11-Nov-2022
  • (2022)NOMA—Non-offensive Messaging Application Framework Using Machine Learning Technique for Online Communication Through Social MediaHuman-Centric Smart Computing10.1007/978-981-19-5403-0_27(315-328)Online publication date: 29-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WI '17: Proceedings of the International Conference on Web Intelligence
August 2017
1284 pages
ISBN:9781450349512
DOI:10.1145/3106426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. abusive detection
  2. feature selection
  3. labelling strategy
  4. machine learning

Qualifiers

  • Research-article

Conference

WI '17
Sponsor:

Acceptance Rates

WI '17 Paper Acceptance Rate 118 of 178 submissions, 66%;
Overall Acceptance Rate 118 of 178 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Cyberbullying Detection on Twitter Using Deep Learning-Based Attention Mechanisms and Continuous Bag of Words Feature ExtractionMathematics10.3390/math1116356711:16(3567)Online publication date: 17-Aug-2023
  • (2022)It's All Relative! A Method to Counter Human Bias in Crowdsourced Stance Detection of News ArticlesProceedings of the ACM on Human-Computer Interaction10.1145/35556366:CSCW2(1-25)Online publication date: 11-Nov-2022
  • (2022)NOMA—Non-offensive Messaging Application Framework Using Machine Learning Technique for Online Communication Through Social MediaHuman-Centric Smart Computing10.1007/978-981-19-5403-0_27(315-328)Online publication date: 29-Nov-2022
  • (2022)An Exploration of Machine Learning and Deep Learning Techniques for Offensive Text Detection in Social Media—A Systematic ReviewInternational Conference on Innovative Computing and Communications10.1007/978-981-19-3679-1_45(541-559)Online publication date: 8-Nov-2022
  • (2021)Automatic Identification of Harmful, Aggressive, Abusive, and Offensive Language on the Web: A Survey of Technical Biases Informed by Psychology LiteratureACM Transactions on Social Computing10.1145/34791584:3(1-56)Online publication date: 25-Oct-2021
  • (2021)Towards multidomain and multilingual abusive language detection: a surveyPersonal and Ubiquitous Computing10.1007/s00779-021-01609-127:1(17-43)Online publication date: 11-Aug-2021
  • (2021)Determining the Degree of Relevance of Content on Social Networks Using Machine Learning Techniques and N-GramsProceedings of International Conference on Intelligent Computing, Information and Control Systems10.1007/978-981-15-8443-5_25(313-320)Online publication date: 25-Jan-2021
  • (2020)Cyberbullying detection solutions based on deep learning architecturesMultimedia Systems10.1007/s00530-020-00701-529:3(1839-1852)Online publication date: 13-Oct-2020
  • (2019)Dark Patterns at ScaleProceedings of the ACM on Human-Computer Interaction10.1145/33591833:CSCW(1-32)Online publication date: 7-Nov-2019
  • (2019)All Users are (Not) Created EqualProceedings of the ACM on Human-Computer Interaction10.1145/33591823:CSCW(1-28)Online publication date: 7-Nov-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media