research-article

Negative Stances Detection from Multilingual Data Streams in Low-Resource Languages on Social Media Using BERT and CNN-Based Transfer Learning Model

Author:

Sanjay KumarAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 1

Article No.: 9, Pages 1 - 18

https://doi.org/10.1145/3625821

Published: 15 January 2024 Publication History

Get Access

Abstract

Online social media allows users to connect with a large number of people across the globe and facilitate the exchange of information efficiently. These platforms cater to many of our day-to-day needs. However, at the same time, social media have been increasingly used to transmit negative stances such as derogatory language, hate speech, and cyberbullying. The task of identifying the negative stances from social media posts or comments or tweets is termed negative stance detection. One of the major challenges associated with negative stance detection is that most of the content published on social media is often in a multilingual format. This work aims to identify negative stances from multilingual data streams in low-resource languages on social media using a hybrid transfer learning and deep convolutional neural network approach. The proposed work starts by preprocessing the multilingual datasets by removing irrelevant information such as special characters and hyperlinks. The processed dataset is then passed through a pretrained BERT (bidirectional encoder representations from Transformers) model to generate embeddings by fine-tuning the model as per the dataset under consideration. The generated word embeddings are then passed to a deep convolutional neural network for extracting the latent features from the texts and removing the unessential information. This helps our model to achieve robustness and effectiveness for efficient learning on the given dataset and make appropriate predictions on zero-shot data. The article utilizes several optimization strategies for examining the impact of fine-tuning different BERT layers on the model’s performance. Intensive experiments on a variety of languages — namely, English, French, Italian, Danish, Arabic, Spanish, Indonesian, German, and Portuguese — are performed. The experimental results demonstrate the effectiveness and efficiency of the proposed framework.

References

[1]

Mukul Anand and R. Eswari. 2019. Classification of abusive comments in social media using deep learning. In 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC). IEEE, 974–977.

Abstract

References

Cited By

Index Terms

Recommendations

Multilingual Offensive Language Identification for Low-resource Languages

Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning

User-aware multilingual abusive content detection in social media

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

Share

Share this Publication link

Share on social media

Affiliations