research-article

Character-Level Chinese Toxic Comment Classification Algorithm Based on CNN and Bi-GRU

Authors:

Zhongguo WangAuthors Info & Claims

CSSE '22: Proceedings of the 5th International Conference on Computer Science and Software Engineering

Pages 108 - 114

https://doi.org/10.1145/3569966.3570000

Published: 20 December 2022 Publication History

Abstract

At present, the classification of “toxic comment” is mainly studied in the English context, whereas Chinese context is less explored and even lacks a public corpus. As many comment are short texts with sparse features and strong context dependence, this study proposes a character-level embedded neural network model based on the convolutional neural network (CNN) and bidirectional gated recurrent unit (Bi-GRU). Then, the classification of toxic comment based on the Chinese toxic comment dataset is developed. In our proposed model, the CNN, which combines character- and word-level vectors, is used to fully obtain the local important features of the text, and then the bidirectional timing information acquisition ability of Bi-GRU is used to improve the accuracy of the Chinese toxic comment classification. Experimental results show that the F1 score of our proposed model can reach 0.8081, which is better than the correlation comparison models.

References

[1]

Kalyani Chadha, Linda Steiner, Jessica Vitak, and Zahra Ashktorab. 2020. Women’s Responses to Online Harassment. International Journal of Communication 14 (2020), 19.

[2]

Isobelle Clarke and Dr Grieve. 2017. Dimensions of Abusive Language on Twitter. 1–10. https://doi.org/10.18653/v1/W17-3001

[3]

Spiros V. Georgakopoulos, Sotiris K. Tasoulis, Aristidis G. Vrahatis, and Vassilis P. Plagianakos. 2018. Convolutional Neural Networks for Toxic Comment Classification. (2018). arXiv:arXiv:1802.09957

[4]

Y. He and W. Yu. 2019. A Sentence Similarity Calculation Method Based on Word Vector and LSTM. Journal of Yangtze University (Natural Science Edition) 16, 1(2019), 88–94. https://doi.org/10.16772/j.cnki.1673-1409.2019.01.017

[5]

Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. Deceiving Google’s Perspective API Built for Detecting Toxic Comments. (2017). arXiv:arXiv:1702.08138

[6]

X. Jin, L. Li, and L. Zhong. 2017. Review Spam Detection Approach Based on Topic Model and Sentiment Analysis. Computer Science 44, 10 (2017), 254–258. https://doi.org/10.11896/j.issn.1002-137X.2017.10.046

[7]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. (2014). arXiv:arXiv:1408.5882

[8]

Ritesh Kumar, Guggilla Bhanodai, Rajendra Pamula, and Maheshwar Reddy Chennuru. 2018. TRAC-1 Shared Task on Aggression Identification: IIT(ISM)@COLING’18. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). Association for Computational Linguistics, Santa Fe, New Mexico, USA, 58–65. https://aclanthology.org/W18-4407

[9]

Ho-Suk Lee, Hong-Rae Lee, Jun-U Park, and Yo-Sub Han. 2018. An Abusive Text Detection System based on Enhanced Abusive and Non-Abusive Word Lists. Decision Support Systems 113 (06 2018). https://doi.org/10.1016/j.dss.2018.06.009

[10]

Jianping Li, Yimou Xu, and Huaye Shi. 2019. Bidirectional LSTM with Hierarchical Attention for Text Classification. In 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Vol. 1. 456–459. https://doi.org/10.1109/IAEAC47372.2019.8997969

[11]

W. Li, W. Li, and Y. Wu. 2017. Combination Methods of Chinese Character and Word Embedding in Deep Learning. Journal of Chinese Information Processing 31, 7 (2017), 140–146. https://doi.org/10.3969/j.issn.1003-0077.2017.06.019

[12]

Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support vector machines and Word2vec for text classification with semantic features, In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC). IEEE, 136–140. https://doi.org/10.1109/ICCI-CC.2015.7259377

[13]

L. Liu, L. Yang, S. Zhang, and H. Lin. 2015. Convolutional Neural Networks for Chinese Micro-Blog Sentiment Analysis. Journal of Chinese Information Processing 29, 6 (2015), 159–165. https://oversea.cnki.net/kcms/detail/detail.aspx?dbcode=CJFD&dbname=CJFD2015&filename=MESS201506022

[14]

Ying Wei and Qingsong Chen. 2021. Development of a Chinese College Students’ Attitudes Toward Sexual Swear Words Scale. Frontiers in Psychology 12 (08 2021). https://doi.org/10.3389/fpsyg.2021.664065

[15]

Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2016. Ex Machina: Personal Attacks Seen at Scale. (2016). arXiv:arXiv:1610.08914

[16]

M. Xu and Y. Chen. 2021. Generative Adversarial Network Model with Topic Information for Spam Classification. Journal of Chinese Computer Systems 42, 11 (2021), 2292–2299. http://qikan.cqvip.com/Qikan/Article/Detail?id=7106097701

[17]

P. Yan and X. Hu. 2022. The Semantic Defamiliarization of Swear Words Used in the Virtual Space. Journal of Xinjiang University (Philosophy, Humanities & Social Science) 50, 1(2022), 150–156. https://doi.org/10.13568/j.cnki.issn1000-2820.2022.01.021

[18]

F. Yuan, H. Liu, L. Wang, K. Feng, and G. Huang. 2020. Spam Review Detection Model Fusing Multiple Features. Intelligent Automation & Soft Computing 41, 3 (2020), 539–543. http://qikan.cqvip.com/Qikan/Article/Detail?id=7101180591

[19]

L. Zhang, X. Wang, B. Huang, and Y. Liu. 2019. A Sentiment Classification Model and Experimental Study of Microblog Commentary Based on Multivariate Convolutional Neural Networks Based on Word Vector. Library and Information Service 63, 18 (2019), 99–108. https://doi.org/10.13266/j.issn.0252-3116.2019.18.012

Cited By

Yu ZJia YHong Z(2024)Detection System Based on Text Adversarial and Multi-Information Fusion for Inappropriate Comments in Mobile Application ReviewsElectronics10.3390/electronics1308143213:8(1432)Online publication date: 10-Apr-2024
https://doi.org/10.3390/electronics13081432

Index Terms

Character-Level Chinese Toxic Comment Classification Algorithm Based on CNN and Bi-GRU
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Performance Comparisons of Bi-LSTM and Bi-GRU Networks in Chinese Word Segmentation
ICDLT '21: Proceedings of the 2021 5th International Conference on Deep Learning Technologies

The Bi-directional Long Short-Time Memory (Bi-LSTM) neural networks can effectively use contextual information in both directions when comparing with the LSTM neural networks. It is more advantageous to extract text information in the word segmentation ...
Handwritten Character Recognition from Images using CNN-ECOC
Abstract
Recently, deep learning and character recognition have drawn the attention of many researchers. The deep neural networks have state-of-the-art performance in solving many classification and recognition problems. The Optical Character Recognition (...
Roman Urdu toxic comment classification
Abstract
With the increasing popularity of user-generated content on social media, the number of toxic texts is also on the rise. Such texts cause adverse effects on users and society at large, therefore, the identification of toxic comments is a growing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CSSE '22: Proceedings of the 5th International Conference on Computer Science and Software Engineering

October 2022

753 pages

ISBN:9781450397780

DOI:10.1145/3569966

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CSSE 2022

CSSE 2022: 2022 5th International Conference on Computer Science and Software Engineering

October 21 - 23, 2022

Guilin, China

Acceptance Rates

Overall Acceptance Rate 33 of 74 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
88
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yu ZJia YHong Z(2024)Detection System Based on Text Adversarial and Multi-Information Fusion for Inappropriate Comments in Mobile Application ReviewsElectronics10.3390/electronics1308143213:8(1432)Online publication date: 10-Apr-2024
https://doi.org/10.3390/electronics13081432

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten