skip to main content
10.1145/3639479.3639480acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmlnlpConference Proceedingsconference-collections
research-article

Offensive Text Classification based on Ernie??s Dual Channel Composite Model

Published:28 February 2024Publication History

ABSTRACT

With the widespread popularity of the Internet, offensive text information in cyberspace has attracted widespread attention from society. Currently, offensive text recognition mainly relies on pre constructed sensitive words for recognition, which cannot effectively intercept text without obvious offensive words. This article proposes a dual channel composite model based on the Ernie pre training model. First, the Ernie pre training model is used to construct a dynamic word vector, and more efficient text semantic information is obtained through its internal multi-layer and bidirectional Transformer structure. Then, a dual channel model is added to further refine text information. Bi-GRU is used to extract global semantics and TextCNN is used to extract local information, extracting semantic information features at different abstract levels. The experimental results show that when tested on the Chinese offensive language dataset COLDataset, the accuracy and F1 value of the model are significantly better than the baseline model COLDetector, reaching 83.81% and 82.90%, respectively. This verifies that adding a dual channel fusion network to the Ernie pre trained model can more effectively extract text features and improve the classification performance of the model.

References

  1. Maha Jarallah Althobaiti. 2022. BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis. International Journal of Advanced Computer Science and Applications (2022). https://api.semanticscholar.org/CorpusID:249289785Google ScholarGoogle ScholarCross RefCross Ref
  2. Bharathi B and Agnusimmaculate Silvia A. 2021. SSNCSE_NLP@DravidianLangTech-EACL2021: Offensive Language Identification on Multilingual Code Mixing Text. In DRAVIDIANLANGTECH. https://api.semanticscholar.org/CorpusID:233365282Google ScholarGoogle Scholar
  3. Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2013. Detecting Offensive Language in Social Media to Protect Adolescent Online Safety. In Privacy, Security, Risk Trust.Google ScholarGoogle Scholar
  4. Junyoung Chung, Caglar Gulcehre, Kyung Hyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. Eprint Arxiv (2014).Google ScholarGoogle Scholar
  5. Jiawen Deng, Jingyan Zhou, Hao Sun, Fei Mi, and Minlie Huang. 2022. Cold: A benchmark for chinese offensive language detection. (Dec. 2022), 11580–11599. https://aclanthology.org/2022.emnlp-main.796Google ScholarGoogle Scholar
  6. Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018).Google ScholarGoogle Scholar
  7. Bjrn Gambck and Utpal Kumar Sikdar. 2017. Using Convolutional Neural Networks to Classify Hate-Speech. In Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  8. Parisa Hajibabaee, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, and James H. Jones. 2022. Offensive Language Detection on Social Media Based on Text Classification. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC) (2022), 0092–0098. https://api.semanticscholar.org/CorpusID:247231199Google ScholarGoogle Scholar
  9. Sherzod Hakimov and Ralph Ewerth. 2021. Combining Textual Features for the Detection of Hateful and Offensive Language. arXiv e-prints (2021).Google ScholarGoogle Scholar
  10. Mai Ibrahim, Marwan Torki, and Nagwa El-Makky. 2020. AlexU-BackTranslation-TL at SemEval-2020 Task 12: Improving Offensive Language Detection Using Data Augmentation and Transfer Learning. In Proceedings of the Fourteenth Workshop on Semantic Evaluation. International Committee for Computational Linguistics, Barcelona (online), 1881–1890. https://doi.org/10.18653/v1/2020.semeval-1.248Google ScholarGoogle ScholarCross RefCross Ref
  11. Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. Eprint Arxiv (2014).Google ScholarGoogle Scholar
  12. Irene Kwok and Yuzhou Wang. 2013. Locate the Hate: Detecting Tweets against Blacks. In National Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  13. Zewdie Mossie and Jenq Haur Wang. 2020. Vulnerable community identification using hate speech detection on social media. Information Processing & Management 57, 3 (2020), 102087.1–102087.16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. 2020. Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model. PLoS ONE 15, 8 (2020), e0237861.Google ScholarGoogle ScholarCross RefCross Ref
  15. Marzieh Mozafari, Reza Farahbakhsh, and Noël Crespi. 2022. Cross-Lingual Few-Shot Hate Speech and Offensive Language Detection using Meta Learning. IEEE Access PP (2022), 1–1. https://api.semanticscholar.org/CorpusID:246416924Google ScholarGoogle ScholarCross RefCross Ref
  16. Hamada A. Nayel. 2020. NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets. abs/2007.13339 (2020), 2086–2089.Google ScholarGoogle Scholar
  17. Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, and Haifeng Wang. 2021. ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. (2021).Google ScholarGoogle Scholar
  18. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019).Google ScholarGoogle Scholar
  19. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv (2017).Google ScholarGoogle Scholar
  20. Dai Wenliang, Yu Tiezheng, Liu Zihan, and Fung Pascale. 2020. Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection. abs/2004.13432 (2020), 2060–2066.Google ScholarGoogle Scholar
  21. Guang Xiang, Bin Fan, Ling Wang, Jason Hong, and Carolyn Rose. 2012. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In Proceedings of the 21st ACM international conference on Information and knowledge management.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, and Ritesh Kumar. 2019. Predicting the Type and Target of Offensive Posts in Social Media. (2019).Google ScholarGoogle Scholar
  23. Z. Zhang, D. Robinson, and J. Tepper. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In ESWC 2018.Google ScholarGoogle Scholar
  24. Jian Zhu, Zuoyu Tian, and Sandra Kübler. 2019. UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs. (2019).Google ScholarGoogle Scholar

Index Terms

  1. Offensive Text Classification based on Ernie??s Dual Channel Composite Model

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      MLNLP '23: Proceedings of the 2023 6th International Conference on Machine Learning and Natural Language Processing
      December 2023
      252 pages
      ISBN:9798400709241
      DOI:10.1145/3639479

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 February 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format