skip to main content
10.1145/3508230.3508234acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article

Improved Bi-GRU Model for Imbalanced English Toxic Comments Dataset

Published: 08 March 2022 Publication History

Abstract

Deep learning is widely used in the study of English toxic comment classification. However, most existing studies failed to consider data imbalance. Aiming at an imbalanced English Toxic Comments Dataset, we propose an improved Bi-gated recurrent unit (GRU) model that combines an oversampling and cost-sensitive method. We use random oversampling in the improved model to reduce the data imbalance, introduce a cost-sensitive method, and propose a new loss function for the Bi-GRU model. Experimental results show that the improved Bi-GRU model demonstrates a significantly improved classification performance in the imbalanced English Toxic Comments Dataset.

References

[1]
Support and S. Team, “Harassment survey.” Wikimedia Foundation, 2015. https://foundation.wikimedia.org/wiki/File:Harassment_Survey_2015_-_Results_Report.pdf.
[2]
K. Dinakar, R. Reichart, and H. Lieberman, “Modeling the detection of textual cyberbullying,” in Proceedings of the International AAAI Conference on Web and Social Media, vol. 5, 2011.
[3]
J.-M. Xu, K.-S. Jun, X. Zhu, and A. Bellmore, “Learning from bullying traces in social media,” in Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies, pp. 656–666, 2012.
[4]
T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” in Proceedings of the International AAAI Conference on Web and Social Media, vol. 11, 2017.
[5]
S. V. Georgakopoulos, S. K. Tasoulis, A. G. Vrahatis, and V. P. Plagianakos, “Convolutional neural networks for toxic comment classification,” in Proceedings of the 10th hellenic conference on artificial intelligence, pp. 1–6, 2018.
[6]
S. V. Georgakopoulos, S. K. Tasoulis, A. G. Vrahatis, and V. P. Plagianakos, “Convolutional neural networks for toxic comment classification,” in Proceedings of the 10th hellenic conference on artificial intelligence, pp. 1–6, 2018.
[7]
N. Nikhil, R. Pahwa, M. K. Nirala, and R. Khilnani, “Lstms with attention for aggression detection,” in Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pp. 52–57, 2018.
[8]
R. Kumar, G. Bhanodai, R. Pamula, and M. R. Chennuru, “Trac-1 shared task on aggression identification: Iit (ism)@ coling鈥?8,” in Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pp. 58–65, 2018.
[9]
R. Pronko, “Simple bidirectional lstm solution for text classification,” Proceedings ofthePolEval2019Workshop, p. 111, 2019.
[10]
S. Srivastava, P. Khurana, and V. Tewari, “Identifying aggression and toxicity in comments using capsule network,” in Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pp. 98–105, 2018.
[11]
V. Garcá, J. S. Sánchez, and R. A. Mollineda, “On the effectiveness of preprocessing methods when dealing with different levels of class imbalance,” Knowledge-Based Systems, vol. 25, no. 1, pp. 13–21, 2012.
[12]
Y.-X. Wang, D. Ramanan, and M. Hebert, “Learning to model the tail,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 7032–7042, 2017.
[13]
B. Krawczyk, “Cost-sensitive one-vs-one ensemble for multi-class imbalanced data,” in 2016 International Joint Conference on Neural Networks (IJCNN), pp. 2447–2452, IEEE, 2016.
[14]
C. Zhang, K. C. Tan, H. Li, and G. S. Hong, “A cost-sensitive deep belief network for imbalanced classification,” IEEE transactions on neural networks and learning systems, vol. 30, no. 1, pp. 109–122, 2018.
[15]
Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9268–9277, 2019.
[16]
J. Cheng, L. Dong, and M. Lapata, “Long short-term memory-networks for machine reading,” arXiv preprint arXiv:1601.06733, 2016.

Cited By

View all
  • (2025)Urdu Toxic Comment Classification With PURUTT Corpus DevelopmentIEEE Access10.1109/ACCESS.2025.353586213(21635-21651)Online publication date: 2025
  • (2025)Selection and evaluation of a set of attributes appropriate for detection of antisocial behaviour in online mediaMultimedia Tools and Applications10.1007/s11042-024-20514-2Online publication date: 13-Jan-2025
  • (2024)A survey on textual emotion cause extraction in social networksDigital Communications and Networks10.1016/j.dcan.2024.07.004Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
NLPIR '21: Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval
December 2021
175 pages
ISBN:9781450387354
DOI:10.1145/3508230
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bi-GRU
  2. Cost-sensitive
  3. Imbalanced Data
  4. ROS
  5. Toxic Comments

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Major Project of Natural Science Research Foundation of Education Bureau of Anhui Province, China
  • Project of University Excellent Talents of Education Bureau of Anhui Province, China

Conference

NLPIR 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Urdu Toxic Comment Classification With PURUTT Corpus DevelopmentIEEE Access10.1109/ACCESS.2025.353586213(21635-21651)Online publication date: 2025
  • (2025)Selection and evaluation of a set of attributes appropriate for detection of antisocial behaviour in online mediaMultimedia Tools and Applications10.1007/s11042-024-20514-2Online publication date: 13-Jan-2025
  • (2024)A survey on textual emotion cause extraction in social networksDigital Communications and Networks10.1016/j.dcan.2024.07.004Online publication date: Jul-2024
  • (2024)IntroductionTextual Emotion Classification Using Deep Broad Learning10.1007/978-3-031-67718-2_1(1-30)Online publication date: 28-Sep-2024
  • (2023)Ensemble Stacking Model for Sentiment Analysis of Emirati and Arabic DialectsJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10169135:8(101691)Online publication date: Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media