research-article

Offensive Text Classification based on Ernie??s Dual Channel Composite Model

Authors:

Junkuo CaoAuthors Info & Claims

MLNLP '23: Proceedings of the 2023 6th International Conference on Machine Learning and Natural Language Processing

Pages 1 - 7

https://doi.org/10.1145/3639479.3639480

Published: 28 February 2024 Publication History

Abstract

With the widespread popularity of the Internet, offensive text information in cyberspace has attracted widespread attention from society. Currently, offensive text recognition mainly relies on pre constructed sensitive words for recognition, which cannot effectively intercept text without obvious offensive words. This article proposes a dual channel composite model based on the Ernie pre training model. First, the Ernie pre training model is used to construct a dynamic word vector, and more efficient text semantic information is obtained through its internal multi-layer and bidirectional Transformer structure. Then, a dual channel model is added to further refine text information. Bi-GRU is used to extract global semantics and TextCNN is used to extract local information, extracting semantic information features at different abstract levels. The experimental results show that when tested on the Chinese offensive language dataset COLDataset, the accuracy and F1 value of the model are significantly better than the baseline model COLDetector, reaching 83.81% and 82.90%, respectively. This verifies that adding a dual channel fusion network to the Ernie pre trained model can more effectively extract text features and improve the classification performance of the model.

References

[1]

Maha Jarallah Althobaiti. 2022. BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis. International Journal of Advanced Computer Science and Applications (2022). https://api.semanticscholar.org/CorpusID:249289785

[2]

Bharathi B and Agnusimmaculate Silvia A. 2021. SSNCSE_NLP@DravidianLangTech-EACL2021: Offensive Language Identification on Multilingual Code Mixing Text. In DRAVIDIANLANGTECH. https://api.semanticscholar.org/CorpusID:233365282

[3]

Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2013. Detecting Offensive Language in Social Media to Protect Adolescent Online Safety. In Privacy, Security, Risk Trust.

[4]

Junyoung Chung, Caglar Gulcehre, Kyung Hyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. Eprint Arxiv (2014).

[5]

Jiawen Deng, Jingyan Zhou, Hao Sun, Fei Mi, and Minlie Huang. 2022. Cold: A benchmark for chinese offensive language detection. (Dec. 2022), 11580–11599. https://aclanthology.org/2022.emnlp-main.796

[6]

Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018).

[7]

Bjrn Gambck and Utpal Kumar Sikdar. 2017. Using Convolutional Neural Networks to Classify Hate-Speech. In Meeting of the Association for Computational Linguistics.

[8]

Parisa Hajibabaee, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, and James H. Jones. 2022. Offensive Language Detection on Social Media Based on Text Classification. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC) (2022), 0092–0098. https://api.semanticscholar.org/CorpusID:247231199

[9]

Sherzod Hakimov and Ralph Ewerth. 2021. Combining Textual Features for the Detection of Hateful and Offensive Language. arXiv e-prints (2021).

[10]

Mai Ibrahim, Marwan Torki, and Nagwa El-Makky. 2020. AlexU-BackTranslation-TL at SemEval-2020 Task 12: Improving Offensive Language Detection Using Data Augmentation and Transfer Learning. In Proceedings of the Fourteenth Workshop on Semantic Evaluation. International Committee for Computational Linguistics, Barcelona (online), 1881–1890. https://doi.org/10.18653/v1/2020.semeval-1.248

[11]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. Eprint Arxiv (2014).

[12]

Irene Kwok and Yuzhou Wang. 2013. Locate the Hate: Detecting Tweets against Blacks. In National Conference on Artificial Intelligence.

[13]

Zewdie Mossie and Jenq Haur Wang. 2020. Vulnerable community identification using hate speech detection on social media. Information Processing & Management 57, 3 (2020), 102087.1–102087.16.

Digital Library

[14]

Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. 2020. Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model. PLoS ONE 15, 8 (2020), e0237861.

[15]

Marzieh Mozafari, Reza Farahbakhsh, and Noël Crespi. 2022. Cross-Lingual Few-Shot Hate Speech and Offensive Language Detection using Meta Learning. IEEE Access PP (2022), 1–1. https://api.semanticscholar.org/CorpusID:246416924

[16]

Hamada A. Nayel. 2020. NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets. abs/2007.13339 (2020), 2086–2089.

[17]

Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, and Haifeng Wang. 2021. ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. (2021).

[18]

Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019).

[19]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv (2017).

[20]

Dai Wenliang, Yu Tiezheng, Liu Zihan, and Fung Pascale. 2020. Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection. abs/2004.13432 (2020), 2060–2066.

[21]

Guang Xiang, Bin Fan, Ling Wang, Jason Hong, and Carolyn Rose. 2012. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In Proceedings of the 21st ACM international conference on Information and knowledge management.

Digital Library

[22]

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, and Ritesh Kumar. 2019. Predicting the Type and Target of Offensive Posts in Social Media. (2019).

[23]

Z. Zhang, D. Robinson, and J. Tepper. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In ESWC 2018.

[24]

Jian Zhu, Zuoyu Tian, and Sandra Kübler. 2019. UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs. (2019).

Index Terms

Offensive Text Classification based on Ernie??s Dual Channel Composite Model
1. Social and professional topics
  1. Computing / technology policy
    1. Censorship
      1. Hate speech

Recommendations

A Sentiment Analysis Model for Annual Reports Based on FinBERT and Dual Channel Attention
CAIBDA '24: Proceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms

Deeply mining and quantifying the sentiment information implied in the text of these annual reports can provide investors with a comprehensive understanding of the company's operating conditions, serving as a valuable reference. Currently there are fewer ...
Dual-channel BERT-DBLCA Based on Attention Mechanism for News Category Label Classification Model
ICISE '21: Proceedings of the 6th International Conference on Information Systems Engineering

The accuracy of classification often requires contextual information, and there is a large amount of redundant information that interferes with the accuracy of classification. In response to the above problems, a two-channel BERT-DBLCA news category ...
Analysis of K-Transmit Dual-Receive Diversity withCochannel Interferers over a Rayleigh Fading Channel

The need to combat the severe effects of fading and interference in the rapidly increasing number of communication systems providing wireless services has motivated the study of diversity in the presence of interference. Hence the analysis of wireless ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MLNLP '23: Proceedings of the 2023 6th International Conference on Machine Learning and Natural Language Processing

December 2023

252 pages

ISBN:9798400709241

DOI:10.1145/3639479

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Key Science and Technology Plan Program of Hainan Province
Key Science and Technology Plan Program of Haikou City

Conference

MLNLP 2023

MLNLP 2023: 2023 6th International Conference on Machine Learning and Natural Language Processing

December 27 - 29, 2023

Sanya, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
37
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten