skip to main content
10.1145/3639479.3639480acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmlnlpConference Proceedingsconference-collections
research-article

Offensive Text Classification based on Ernie??s Dual Channel Composite Model

Published: 28 February 2024 Publication History

Abstract

With the widespread popularity of the Internet, offensive text information in cyberspace has attracted widespread attention from society. Currently, offensive text recognition mainly relies on pre constructed sensitive words for recognition, which cannot effectively intercept text without obvious offensive words. This article proposes a dual channel composite model based on the Ernie pre training model. First, the Ernie pre training model is used to construct a dynamic word vector, and more efficient text semantic information is obtained through its internal multi-layer and bidirectional Transformer structure. Then, a dual channel model is added to further refine text information. Bi-GRU is used to extract global semantics and TextCNN is used to extract local information, extracting semantic information features at different abstract levels. The experimental results show that when tested on the Chinese offensive language dataset COLDataset, the accuracy and F1 value of the model are significantly better than the baseline model COLDetector, reaching 83.81% and 82.90%, respectively. This verifies that adding a dual channel fusion network to the Ernie pre trained model can more effectively extract text features and improve the classification performance of the model.

References

[1]
Maha Jarallah Althobaiti. 2022. BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis. International Journal of Advanced Computer Science and Applications (2022). https://api.semanticscholar.org/CorpusID:249289785
[2]
Bharathi B and Agnusimmaculate Silvia A. 2021. SSNCSE_NLP@DravidianLangTech-EACL2021: Offensive Language Identification on Multilingual Code Mixing Text. In DRAVIDIANLANGTECH. https://api.semanticscholar.org/CorpusID:233365282
[3]
Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2013. Detecting Offensive Language in Social Media to Protect Adolescent Online Safety. In Privacy, Security, Risk Trust.
[4]
Junyoung Chung, Caglar Gulcehre, Kyung Hyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. Eprint Arxiv (2014).
[5]
Jiawen Deng, Jingyan Zhou, Hao Sun, Fei Mi, and Minlie Huang. 2022. Cold: A benchmark for chinese offensive language detection. (Dec. 2022), 11580–11599. https://aclanthology.org/2022.emnlp-main.796
[6]
Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018).
[7]
Bjrn Gambck and Utpal Kumar Sikdar. 2017. Using Convolutional Neural Networks to Classify Hate-Speech. In Meeting of the Association for Computational Linguistics.
[8]
Parisa Hajibabaee, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, and James H. Jones. 2022. Offensive Language Detection on Social Media Based on Text Classification. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC) (2022), 0092–0098. https://api.semanticscholar.org/CorpusID:247231199
[9]
Sherzod Hakimov and Ralph Ewerth. 2021. Combining Textual Features for the Detection of Hateful and Offensive Language. arXiv e-prints (2021).
[10]
Mai Ibrahim, Marwan Torki, and Nagwa El-Makky. 2020. AlexU-BackTranslation-TL at SemEval-2020 Task 12: Improving Offensive Language Detection Using Data Augmentation and Transfer Learning. In Proceedings of the Fourteenth Workshop on Semantic Evaluation. International Committee for Computational Linguistics, Barcelona (online), 1881–1890. https://doi.org/10.18653/v1/2020.semeval-1.248
[11]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. Eprint Arxiv (2014).
[12]
Irene Kwok and Yuzhou Wang. 2013. Locate the Hate: Detecting Tweets against Blacks. In National Conference on Artificial Intelligence.
[13]
Zewdie Mossie and Jenq Haur Wang. 2020. Vulnerable community identification using hate speech detection on social media. Information Processing & Management 57, 3 (2020), 102087.1–102087.16.
[14]
Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. 2020. Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model. PLoS ONE 15, 8 (2020), e0237861.
[15]
Marzieh Mozafari, Reza Farahbakhsh, and Noël Crespi. 2022. Cross-Lingual Few-Shot Hate Speech and Offensive Language Detection using Meta Learning. IEEE Access PP (2022), 1–1. https://api.semanticscholar.org/CorpusID:246416924
[16]
Hamada A. Nayel. 2020. NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets. abs/2007.13339 (2020), 2086–2089.
[17]
Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, and Haifeng Wang. 2021. ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. (2021).
[18]
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019).
[19]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv (2017).
[20]
Dai Wenliang, Yu Tiezheng, Liu Zihan, and Fung Pascale. 2020. Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection. abs/2004.13432 (2020), 2060–2066.
[21]
Guang Xiang, Bin Fan, Ling Wang, Jason Hong, and Carolyn Rose. 2012. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In Proceedings of the 21st ACM international conference on Information and knowledge management.
[22]
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, and Ritesh Kumar. 2019. Predicting the Type and Target of Offensive Posts in Social Media. (2019).
[23]
Z. Zhang, D. Robinson, and J. Tepper. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In ESWC 2018.
[24]
Jian Zhu, Zuoyu Tian, and Sandra Kübler. 2019. UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs. (2019).

Index Terms

  1. Offensive Text Classification based on Ernie??s Dual Channel Composite Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    MLNLP '23: Proceedings of the 2023 6th International Conference on Machine Learning and Natural Language Processing
    December 2023
    252 pages
    ISBN:9798400709241
    DOI:10.1145/3639479
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 February 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Dual channel
    2. Ernie pre-trained model
    3. Offensive Text

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Key Science and Technology Plan Program of Hainan Province
    • Key Science and Technology Plan Program of Haikou City

    Conference

    MLNLP 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 37
      Total Downloads
    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media