skip to main content
10.1145/3569966.3570000acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsseConference Proceedingsconference-collections
research-article

Character-Level Chinese Toxic Comment Classification Algorithm Based on CNN and Bi-GRU

Published: 20 December 2022 Publication History

Abstract

At present, the classification of “toxic comment” is mainly studied in the English context, whereas Chinese context is less explored and even lacks a public corpus. As many comment are short texts with sparse features and strong context dependence, this study proposes a character-level embedded neural network model based on the convolutional neural network (CNN) and bidirectional gated recurrent unit (Bi-GRU). Then, the classification of toxic comment based on the Chinese toxic comment dataset is developed. In our proposed model, the CNN, which combines character- and word-level vectors, is used to fully obtain the local important features of the text, and then the bidirectional timing information acquisition ability of Bi-GRU is used to improve the accuracy of the Chinese toxic comment classification. Experimental results show that the F1 score of our proposed model can reach 0.8081, which is better than the correlation comparison models.

References

[1]
Kalyani Chadha, Linda Steiner, Jessica Vitak, and Zahra Ashktorab. 2020. Women’s Responses to Online Harassment. International Journal of Communication 14 (2020), 19.
[2]
Isobelle Clarke and Dr Grieve. 2017. Dimensions of Abusive Language on Twitter. 1–10. https://doi.org/10.18653/v1/W17-3001
[3]
Spiros V. Georgakopoulos, Sotiris K. Tasoulis, Aristidis G. Vrahatis, and Vassilis P. Plagianakos. 2018. Convolutional Neural Networks for Toxic Comment Classification. (2018). arXiv:arXiv:1802.09957
[4]
Y. He and W. Yu. 2019. A Sentence Similarity Calculation Method Based on Word Vector and LSTM. Journal of Yangtze University (Natural Science Edition) 16, 1(2019), 88–94. https://doi.org/10.16772/j.cnki.1673-1409.2019.01.017
[5]
Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. Deceiving Google’s Perspective API Built for Detecting Toxic Comments. (2017). arXiv:arXiv:1702.08138
[6]
X. Jin, L. Li, and L. Zhong. 2017. Review Spam Detection Approach Based on Topic Model and Sentiment Analysis. Computer Science 44, 10 (2017), 254–258. https://doi.org/10.11896/j.issn.1002-137X.2017.10.046
[7]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. (2014). arXiv:arXiv:1408.5882
[8]
Ritesh Kumar, Guggilla Bhanodai, Rajendra Pamula, and Maheshwar Reddy Chennuru. 2018. TRAC-1 Shared Task on Aggression Identification: IIT(ISM)@COLING’18. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). Association for Computational Linguistics, Santa Fe, New Mexico, USA, 58–65. https://aclanthology.org/W18-4407
[9]
Ho-Suk Lee, Hong-Rae Lee, Jun-U Park, and Yo-Sub Han. 2018. An Abusive Text Detection System based on Enhanced Abusive and Non-Abusive Word Lists. Decision Support Systems 113 (06 2018). https://doi.org/10.1016/j.dss.2018.06.009
[10]
Jianping Li, Yimou Xu, and Huaye Shi. 2019. Bidirectional LSTM with Hierarchical Attention for Text Classification. In 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Vol. 1. 456–459. https://doi.org/10.1109/IAEAC47372.2019.8997969
[11]
W. Li, W. Li, and Y. Wu. 2017. Combination Methods of Chinese Character and Word Embedding in Deep Learning. Journal of Chinese Information Processing 31, 7 (2017), 140–146. https://doi.org/10.3969/j.issn.1003-0077.2017.06.019
[12]
Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support vector machines and Word2vec for text classification with semantic features, In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC). IEEE, 136–140. https://doi.org/10.1109/ICCI-CC.2015.7259377
[13]
L. Liu, L. Yang, S. Zhang, and H. Lin. 2015. Convolutional Neural Networks for Chinese Micro-Blog Sentiment Analysis. Journal of Chinese Information Processing 29, 6 (2015), 159–165. https://oversea.cnki.net/kcms/detail/detail.aspx?dbcode=CJFD&dbname=CJFD2015&filename=MESS201506022
[14]
Ying Wei and Qingsong Chen. 2021. Development of a Chinese College Students’ Attitudes Toward Sexual Swear Words Scale. Frontiers in Psychology 12 (08 2021). https://doi.org/10.3389/fpsyg.2021.664065
[15]
Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2016. Ex Machina: Personal Attacks Seen at Scale. (2016). arXiv:arXiv:1610.08914
[16]
M. Xu and Y. Chen. 2021. Generative Adversarial Network Model with Topic Information for Spam Classification. Journal of Chinese Computer Systems 42, 11 (2021), 2292–2299. http://qikan.cqvip.com/Qikan/Article/Detail?id=7106097701
[17]
P. Yan and X. Hu. 2022. The Semantic Defamiliarization of Swear Words Used in the Virtual Space. Journal of Xinjiang University (Philosophy, Humanities & Social Science) 50, 1(2022), 150–156. https://doi.org/10.13568/j.cnki.issn1000-2820.2022.01.021
[18]
F. Yuan, H. Liu, L. Wang, K. Feng, and G. Huang. 2020. Spam Review Detection Model Fusing Multiple Features. Intelligent Automation & Soft Computing 41, 3 (2020), 539–543. http://qikan.cqvip.com/Qikan/Article/Detail?id=7101180591
[19]
L. Zhang, X. Wang, B. Huang, and Y. Liu. 2019. A Sentiment Classification Model and Experimental Study of Microblog Commentary Based on Multivariate Convolutional Neural Networks Based on Word Vector. Library and Information Service 63, 18 (2019), 99–108. https://doi.org/10.13266/j.issn.0252-3116.2019.18.012

Cited By

View all
  • (2024)Detection System Based on Text Adversarial and Multi-Information Fusion for Inappropriate Comments in Mobile Application ReviewsElectronics10.3390/electronics1308143213:8(1432)Online publication date: 10-Apr-2024

Index Terms

  1. Character-Level Chinese Toxic Comment Classification Algorithm Based on CNN and Bi-GRU

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CSSE '22: Proceedings of the 5th International Conference on Computer Science and Software Engineering
    October 2022
    753 pages
    ISBN:9781450397780
    DOI:10.1145/3569966
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 December 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bi-GRU
    2. CNN
    3. Character level
    4. Chinese toxic comment

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CSSE 2022

    Acceptance Rates

    Overall Acceptance Rate 33 of 74 submissions, 45%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Detection System Based on Text Adversarial and Multi-Information Fusion for Inappropriate Comments in Mobile Application ReviewsElectronics10.3390/electronics1308143213:8(1432)Online publication date: 10-Apr-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media