research-article

Toxic Comment Classification Based on Bidirectional Gated Recurrent Unit and Convolutional Neural Network

Authors:

Bao ZhangAuthors Info & Claims

Transactions on Asian and Low-Resource Language Information Processing, Volume 21, Issue 3

Article No.: 51, Pages 1 - 12

https://doi.org/10.1145/3488366

Published: 21 December 2021 Publication History

Abstract

For English toxic comment classification, this paper presents the model that combines Bi-GRU and CNN optimized by global average pooling (BG-GCNN) based on the bidirectional gated recurrent unit (Bi-GRU) and global pooling optimized convolution neural network (CNN). The model treats each type of toxic comment as a binary classification. First, Bi-GRU is used to extract the time-series features of the comment and then the dimensionality is reduced through global pooling optimized convolution neural network. Finally, the classification result is output by Sigmoid function. Comparative experiments show the BG-GCNN model has a better classification effect than Text-CNN, LSTM, Bi-GRU, and other models. The Macro-F1 value of the toxic comment dataset on the Kaggle competition platform is 0.62. The F1 values of the three toxic label classification results (toxic, obscene, and insult label) are 0.81, 0.84, and 0.74, respectively, which are the highest values in the comparative experiment.

Reference

[1]

Support and Safety Team. 2015. Harassment Survey. Wikimedia Foundation, 2015. https://foundation.wikimedia.org/wiki/File:Harassment_Survey_2015_-_Results_Report.pdf.

[2]

K. Dinakar, R. Reichart, and H. Lieberman. 2011. Modeling the detection of textual cyberbullying. In Fifth International AAAI Conference on Weblogs and Social Media.

[3]

J. M. Xu, K. S. Jun, X. Zhu, and A. Bellmore. 2012. Learning from bullying traces in social media. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 656–666.

Digital Library

[4]

T. Davidson, D. Warmsley, M. Macy, and I. Weber. 2017. Automated hate speech detection and the problem of offensive language. In Eleventh International AAAI Conference on Web and Social Media.

[5]

S. V. Georgakopoulos, S. K. Tasoulis, A. G. Vrahatis, and V. P. Plagianakos. 2018. Convolutional neural networks for toxic comment classification. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence. 1–6.

Digital Library

[6]

Y. Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.

[7]

L. Sterckx. An Evaluation of Neural Network Models for Toxic Comment Classification.

[8]

N. Nikhil, R. Pahwa, M. K. Nirala, and R. Khilnani. 2018. LSTM with attention for aggression detection. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). 52–57.

[9]

R. Kumar, G. Bhanodai, R. Pamula, and M. R. Chennuru. 2018. TRAC-1 shared task on aggression identification: IIT (ISM)@ COLING’18. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). 58–65.

[10]

R. Pronko. 2019. Simple bidirectional LSTM solution for text classification. Proceedings of the Pol Eval 2019 Workshop, 2019: 111.

[11]

S. Srivastava, P. Khurana, and V. Tewari. 2018. Identifying aggression and toxicity in comments using capsule network. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). 98–105.

[12]

J. L. Elman, E. A. Bates, M. H. Johnson, A. Karmiloff-Smith, K. Plunkett, and D. Parisi. 1998. Rethinking innateness: A connectionist perspective on development, Vol. 10. MIT Press.

[13]

S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.

Digital Library

[14]

J. Cheng, L. Dong, and M. Lapata. 2016. Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733.

[15]

Li Peng, Yang Yuanwei, Gao Xianjun, Du Lihui, Zhou Yi, Jiang Meiyue, and Zhang Jingbo. 2020. Chinese speech recognition based on bi-directional circulatory neural network [J/OL]. Applied Acoustics, 2020(03):1–8 [2020-06-02]. http://kns.cnki.net/kcms/detail/11.2121.o4.20200506.1009.022.html.

[16]

Xu Yang and Liao Xiaoqin. 2020. Discriminatory discriminations of converting bidirectional gated circulatory units and convolutional neural networks. Journal of Wuhan University (Science Edition) 66, 02 (2020), 111–116.

[17]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.

[18]

M. Lin, Q. Chen, and S. Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400.

[19]

B. Zhou, A. Khosla, A. Lapedriza et al. 2016. Learning deep features for discriminative localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2921–2929.

[20]

J. Zhao, K. Li, X. Xi, S. Wang, V. Saravanan, and R. D. Samuel. 2020. Analysis of complex cognitive task and pattern recognition using distributed patterns of EEG signals with cognitive functions. Neural Computing and Applications. DOI:

[21]

M. Z. Asghar, F. Subhan, H. Ahmad, W. Z. Khan, S. Hakak, T. R. Gadekallu, and M. Alazab. 2020. Senti-eSystem: A sentiment-based eSystem -using hybridized fuzzy and deep neural network for measuring customer satisfaction. Software: Practice and Experience 51, 3 (2020), 571–594. DOI:

[22]

A. O. Rodriguez, D. E. Mateus, P. A. Garcia, A. G. Acosta, and C. E. Marin. 2019. Segmentation methods for image classification using a convolutional neural network on AR-sandbox. IFIP Advances in Information and Communication Technology Artificial Intelligence Applications and Innovations. 391–398. DOI:

[23]

BalaAnand Muthu et al. A framework for extractive text summarization based on deep learning modified neural network classifier. ACM Transactions on Asian and Low-Resource Language Information Processing 2020. DOI:https://doi.org/10.1145/3392048

Digital Library

Cited By

Saeed HKhalil TKamiran F(2025)Urdu Toxic Comment Classification With PURUTT Corpus DevelopmentIEEE Access10.1109/ACCESS.2025.353586213(21635-21651)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3535862
He LLai RShao SLi Z(2024)Design of a Chaotic Communication System Based on Deep Learning With Two-Dimensional ReshapingIEEE Transactions on Vehicular Technology10.1109/TVT.2024.338262573:7(10421-10434)Online publication date: Jul-2024
https://doi.org/10.1109/TVT.2024.3382625
Rayani RTekula SVattigunta SKovi NNamitha K(2024)Leveraging Deep Learning for Detecting Toxicity in Online Comments2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10726256(1-6)Online publication date: 24-Jun-2024
https://doi.org/10.1109/ICCCNT61001.2024.10726256
Show More Cited By

Index Terms

Toxic Comment Classification Based on Bidirectional Gated Recurrent Unit and Convolutional Neural Network
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

Indonesian Abstractive Text Summarization Using Bidirectional Gated Recurrent Unit
Abstract
Abstractive text summarization is more challenging than the extractive one since it is performed by paraphrasing the entire contents of the text, which has a higher difficulty. But, it produces a more natural summary and higher inter-sentence ...
A Comparison Study of Convolutional Neural Network and Recurrent Neural Network on Image Classification
ICIT '22: Proceedings of the 2022 10th International Conference on Information Technology: IoT and Smart City

Image classification is a very important task in the field of computer vision, and it is widely used in daily life. In recent years, deep learning has developed rapidly in the field of image classification. Image classification methods based on deep ...
Minimal gated unit for recurrent neural networks

Recurrent neural networks (RNN) have been very successful in handling sequence data. However, understanding RNN and finding the best practices for RNN learning is a difficult task, partly because there are many competing and complex hidden units, such ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 3

May 2022

413 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3505182

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 December 2021

Accepted: 01 August 2021

Received: 01 October 2020

Published in TALLIP Volume 21, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation in Higher Education of Anhui, China
Anhui Province Excellent Talents Project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
372
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Saeed HKhalil TKamiran F(2025)Urdu Toxic Comment Classification With PURUTT Corpus DevelopmentIEEE Access10.1109/ACCESS.2025.353586213(21635-21651)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3535862
He LLai RShao SLi Z(2024)Design of a Chaotic Communication System Based on Deep Learning With Two-Dimensional ReshapingIEEE Transactions on Vehicular Technology10.1109/TVT.2024.338262573:7(10421-10434)Online publication date: Jul-2024
https://doi.org/10.1109/TVT.2024.3382625
Rayani RTekula SVattigunta SKovi NNamitha K(2024)Leveraging Deep Learning for Detecting Toxicity in Online Comments2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10726256(1-6)Online publication date: 24-Jun-2024
https://doi.org/10.1109/ICCCNT61001.2024.10726256
Madhuri SNagalakshmi V(2023)A Novel Blockchain Strategy for Third Party Aware Crosschain Transaction FrameworkWireless Personal Communications: An International Journal10.1007/s11277-023-10588-w131:4(2897-2917)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1007/s11277-023-10588-w
Varghese FSasikala P(2023)Secure Data Transmission Using Optimized Cryptography and Steganography Using Syndrome-Trellis CodingWireless Personal Communications: An International Journal10.1007/s11277-023-10298-3130:1(551-578)Online publication date: 28-Mar-2023
https://dl.acm.org/doi/10.1007/s11277-023-10298-3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents