ABSTRACT
With the widespread popularity of the Internet, offensive text information in cyberspace has attracted widespread attention from society. Currently, offensive text recognition mainly relies on pre constructed sensitive words for recognition, which cannot effectively intercept text without obvious offensive words. This article proposes a dual channel composite model based on the Ernie pre training model. First, the Ernie pre training model is used to construct a dynamic word vector, and more efficient text semantic information is obtained through its internal multi-layer and bidirectional Transformer structure. Then, a dual channel model is added to further refine text information. Bi-GRU is used to extract global semantics and TextCNN is used to extract local information, extracting semantic information features at different abstract levels. The experimental results show that when tested on the Chinese offensive language dataset COLDataset, the accuracy and F1 value of the model are significantly better than the baseline model COLDetector, reaching 83.81% and 82.90%, respectively. This verifies that adding a dual channel fusion network to the Ernie pre trained model can more effectively extract text features and improve the classification performance of the model.
- Maha Jarallah Althobaiti. 2022. BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis. International Journal of Advanced Computer Science and Applications (2022). https://api.semanticscholar.org/CorpusID:249289785Google ScholarCross Ref
- Bharathi B and Agnusimmaculate Silvia A. 2021. SSNCSE_NLP@DravidianLangTech-EACL2021: Offensive Language Identification on Multilingual Code Mixing Text. In DRAVIDIANLANGTECH. https://api.semanticscholar.org/CorpusID:233365282Google Scholar
- Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2013. Detecting Offensive Language in Social Media to Protect Adolescent Online Safety. In Privacy, Security, Risk Trust.Google Scholar
- Junyoung Chung, Caglar Gulcehre, Kyung Hyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. Eprint Arxiv (2014).Google Scholar
- Jiawen Deng, Jingyan Zhou, Hao Sun, Fei Mi, and Minlie Huang. 2022. Cold: A benchmark for chinese offensive language detection. (Dec. 2022), 11580–11599. https://aclanthology.org/2022.emnlp-main.796Google Scholar
- Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018).Google Scholar
- Bjrn Gambck and Utpal Kumar Sikdar. 2017. Using Convolutional Neural Networks to Classify Hate-Speech. In Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
- Parisa Hajibabaee, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, and James H. Jones. 2022. Offensive Language Detection on Social Media Based on Text Classification. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC) (2022), 0092–0098. https://api.semanticscholar.org/CorpusID:247231199Google Scholar
- Sherzod Hakimov and Ralph Ewerth. 2021. Combining Textual Features for the Detection of Hateful and Offensive Language. arXiv e-prints (2021).Google Scholar
- Mai Ibrahim, Marwan Torki, and Nagwa El-Makky. 2020. AlexU-BackTranslation-TL at SemEval-2020 Task 12: Improving Offensive Language Detection Using Data Augmentation and Transfer Learning. In Proceedings of the Fourteenth Workshop on Semantic Evaluation. International Committee for Computational Linguistics, Barcelona (online), 1881–1890. https://doi.org/10.18653/v1/2020.semeval-1.248Google ScholarCross Ref
- Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. Eprint Arxiv (2014).Google Scholar
- Irene Kwok and Yuzhou Wang. 2013. Locate the Hate: Detecting Tweets against Blacks. In National Conference on Artificial Intelligence.Google ScholarCross Ref
- Zewdie Mossie and Jenq Haur Wang. 2020. Vulnerable community identification using hate speech detection on social media. Information Processing & Management 57, 3 (2020), 102087.1–102087.16.Google ScholarDigital Library
- Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. 2020. Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model. PLoS ONE 15, 8 (2020), e0237861.Google ScholarCross Ref
- Marzieh Mozafari, Reza Farahbakhsh, and Noël Crespi. 2022. Cross-Lingual Few-Shot Hate Speech and Offensive Language Detection using Meta Learning. IEEE Access PP (2022), 1–1. https://api.semanticscholar.org/CorpusID:246416924Google ScholarCross Ref
- Hamada A. Nayel. 2020. NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets. abs/2007.13339 (2020), 2086–2089.Google Scholar
- Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, and Haifeng Wang. 2021. ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. (2021).Google Scholar
- Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019).Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv (2017).Google Scholar
- Dai Wenliang, Yu Tiezheng, Liu Zihan, and Fung Pascale. 2020. Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection. abs/2004.13432 (2020), 2060–2066.Google Scholar
- Guang Xiang, Bin Fan, Ling Wang, Jason Hong, and Carolyn Rose. 2012. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In Proceedings of the 21st ACM international conference on Information and knowledge management.Google ScholarDigital Library
- Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, and Ritesh Kumar. 2019. Predicting the Type and Target of Offensive Posts in Social Media. (2019).Google Scholar
- Z. Zhang, D. Robinson, and J. Tepper. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In ESWC 2018.Google Scholar
- Jian Zhu, Zuoyu Tian, and Sandra Kübler. 2019. UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs. (2019).Google Scholar
Index Terms
- Offensive Text Classification based on Ernie??s Dual Channel Composite Model
Recommendations
Dual-channel BERT-DBLCA Based on Attention Mechanism for News Category Label Classification Model
ICISE '21: Proceedings of the 6th International Conference on Information Systems EngineeringThe accuracy of classification often requires contextual information, and there is a large amount of redundant information that interferes with the accuracy of classification. In response to the above problems, a two-channel BERT-DBLCA news category ...
Analysis of K-Transmit Dual-Receive Diversity withCochannel Interferers over a Rayleigh Fading Channel
The need to combat the severe effects of fading and interference in the rapidly increasing number of communication systems providing wireless services has motivated the study of diversity in the presence of interference. Hence the analysis of wireless ...
Cross-lingual Text Classification via Model Translation with Limited Dictionaries
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementCross-lingual text classification (CLTC) refers to the task of classifying documents in different languages into the same taxonomy of categories. An open challenge in CLTC is to classify documents for the languages where labeled training data are not ...
Comments