skip to main content
research-article

Negative Stances Detection from Multilingual Data Streams in Low-Resource Languages on Social Media Using BERT and CNN-Based Transfer Learning Model

Published: 15 January 2024 Publication History

Abstract

Online social media allows users to connect with a large number of people across the globe and facilitate the exchange of information efficiently. These platforms cater to many of our day-to-day needs. However, at the same time, social media have been increasingly used to transmit negative stances such as derogatory language, hate speech, and cyberbullying. The task of identifying the negative stances from social media posts or comments or tweets is termed negative stance detection. One of the major challenges associated with negative stance detection is that most of the content published on social media is often in a multilingual format. This work aims to identify negative stances from multilingual data streams in low-resource languages on social media using a hybrid transfer learning and deep convolutional neural network approach. The proposed work starts by preprocessing the multilingual datasets by removing irrelevant information such as special characters and hyperlinks. The processed dataset is then passed through a pretrained BERT (bidirectional encoder representations from Transformers) model to generate embeddings by fine-tuning the model as per the dataset under consideration. The generated word embeddings are then passed to a deep convolutional neural network for extracting the latent features from the texts and removing the unessential information. This helps our model to achieve robustness and effectiveness for efficient learning on the given dataset and make appropriate predictions on zero-shot data. The article utilizes several optimization strategies for examining the impact of fine-tuning different BERT layers on the model’s performance. Intensive experiments on a variety of languages — namely, English, French, Italian, Danish, Arabic, Spanish, Indonesian, German, and Portuguese — are performed. The experimental results demonstrate the effectiveness and efficiency of the proposed framework.

References

[1]
Mukul Anand and R. Eswari. 2019. Classification of abusive comments in social media using deep learning. In 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC). IEEE, 974–977.
[2]
Matthew Beatty. 2020. Graph-based methods to detect hate speech diffusion on Twitter. In 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 502–506.
[3]
Eloi Brassard-Gourdeau and Richard Khoury. 2019. Subversive toxicity detection using sentiment information. In Proceedings of the 3rd Workshop on Abusive Language Online. 1–10.
[4]
Navoneel Chakrabarty. 2020. A machine learning approach to comment toxicity classification. In Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019. Springer, 183–193.
[5]
Theodora Chu, Kylie Jue, and Max Wang. 2016. Comment abuse classification with deep learning. https://web.stanford.edu/class/cs224n/reports/2762092.pdf (2016).
[6]
Michele Corazza, Stefano Menini, Elena Cabrio, Sara Tonelli, and Serena Villata. 2020. A multilingual evaluation for online hate speech detection. ACM Transactions on Internet Technology (TOIT) 20, 2 (2020), 1–22.
[7]
Thomas Davidson, Debasmita Bhattacharya, and Ingmar Weber. 2019. Racial bias in hate speech and abusive language detection datasets. arXiv preprint arXiv:1905.12516 (2019).
[8]
Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. 512–515.
[9]
L. K. Dhanya and Kannan Balakrishnan. 2021. Hate speech detection in Asian languages: A survey. In 2021 International Conference on Communication, Control and Information Sciences (ICCISc), Vol. 1. IEEE, 1–5.
[10]
Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th International Conference on World Wide Web. 29–30.
[11]
Damayanti Elisabeth, Indra Budi, and Muhammad Okky Ibrohim. 2020. Hate code detection in Indonesian tweets using machine learning approach: A dataset and preliminary study. In 2020 8th International Conference on Information and Communication Technology (ICoICT). IEEE, 1–6.
[12]
Björn Gambäck and Utpal Kumar Sikdar. 2017. Using convolutional neural networks to classify hate-speech. In Proceedings of the 1st Workshop on Abusive Language Online. 85–90.
[13]
Sayani Ghosal and Amita Jain. 2023. HateCircle and unsupervised hate speech detection incorporating emotion and contextual semantics. ACM Transactions on Asian and Low-Resource Language Information Processing 22, 4 (2023), 1–28.
[14]
Md Khaled Hasan, Md Shamim Ahsan, S. H. Shah Newaz, and Gyu Myoung Lee. 2021. Human face detection techniques: A comprehensive review and future research directions. Electronics 10, 19 (2021), 2354.
[15]
Muhammad Moin Khan, Khurram Shahzad, and Muhammad Kamran Malik. 2021. Hate speech detection in Roman Urdu. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 1 (2021), 1–19.
[16]
Sanjay Kumar, Payas Dhingra, Pushkar Jaiswal, and Rohit Bharti. 2022. Fake news classification using vectorized semantic and syntactical analysis. In Advances in Data and Information Sciences: Proceedings of ICDIS 2021. Springer, 539–550.
[17]
Sanjay Kumar, Akshi Kumar, Abhishek Mallik, and Rishi Ranjan Singh. 2023. OptNet-fake: Fake news detection in socio-cyber platforms using grasshopper optimization and deep neural network. IEEE Transactions on Computational Social Systems (2023).
[18]
İslam Mayda, Yunus Emre Demir, Tuğba Dalyan, and Banu Diri. 2021. Hate speech dataset from Turkish tweets. In 2021 Innovations in Intelligent Systems and Applications Conference (ASYU). IEEE, 1–6.
[19]
Joshua Melton, Arunkumar Bagavathi, and Siddharth Krishnan. 2020. DeL-haTE: A deep learning tunable ensemble for hate speech detection. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 1015–1022.
[20]
Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. 2019. A BERT-based transfer learning approach for hate speech detection in online social media. In International Conference on Complex Networks and Their Applications. Springer, 928–940.
[21]
Nanlir Sallau Mullah and Wan Mohd Nazmee Wan Zainon. 2021. Advances in machine learning algorithms for hate speech detection in social media: A review. IEEE Access (2021).
[22]
Nedjma Ousidhoum, Zizheng Lin, Hongming Zhang, Yangqiu Song, and Dit-Yan Yeung. 2019. Multilingual and multi-aspect hate speech analysis. arXiv preprint arXiv:1908.11049 (2019).
[23]
Varsha Pathaka, Manish Joshib, Prasad Joshic, Monica Mundadad, and Tanmay Joshie. 2020. KBCNMUJAL@ HASOC-Dravidian-CodeMix-FIRE20: Using Machine Learning for Detection of Hate Speech and Offensive Code-mixed Social Media. In The 12th Meeting of Forum for Information Retrieval Evaluation. FIRE 2020, 351–361.
[24]
Andraž Pelicon, Matej Martinc, and Petra Kralj Novak. 2019. Embeddia at SemeEval-2019 Task 6: Detecting hate with neural network and transfer learning approaches. In Proceedings of the 13th International Workshop on Semantic Evaluation. 604–610.
[25]
I. Gede Manggala Putra and Dade Nurjanah. 2020. Hate speech detection in Indonesian language Instagram. In 2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE, 413–420.
[26]
Heng Rathpisey and Teguh Bharata Adji. 2019. Handling imbalance issue in hate speech classification using sampling-based methods. In 2019 5th International Conference on Science in Information Technology (ICSITech). IEEE, 193–198.
[27]
Rini Rini, Ema Utami, and Anggit Dwi Hartanto. 2020. Systematic literature review of hate speech detection with text mining. In 2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS). IEEE, 1–6.
[28]
Marian-Andrei Rizoiu, Tianyu Wang, Gabriela Ferraro, and Hanna Suominen. 2019. Transfer learning for hate speech detection in social media. arXiv preprint arXiv:1906.03829 (2019).
[29]
Hafiz Hassaan Saeed, Khurram Shahzad, and Faisal Kamiran. 2018. Overlapping toxic sentiment classification using deep neural architectures. In 2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 1361–1366.
[30]
Havvanur Şahi, Yasemin Kılıç, and Rahime Belen Saǧlam. 2018. Automated detection of hate speech towards woman on Twitter. In 2018 3rd International Conference on Computer Science and Engineering (UBMK). IEEE, 533–536.
[31]
Yash Saini, Vishal Bachchas, Yogesh Kumar, and Sanjay Kumar. 2020. Abusive text examination using latent Dirichlet allocation, self organizing maps and k means clustering. In 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE, 1233–1238.
[32]
Sara Owsley Sood, Elizabeth F. Churchill, and Judd Antin. 2012. Automatic identification of personal insults on social news sites. Journal of the American Society for Information Science and Technology 63, 2 (2012), 270–285.
[33]
Syahrul Syafaat Syam, Budhi Irawan, and Casi Setianingsih. 2019. Hate speech detection on Twitter using long short-term memory (LSTM) method. In 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE). IEEE, 305–310.
[34]
Betty Van Aken, Julian Risch, Ralf Krestel, and Alexander Löser. 2018. Challenges for toxic comment classification: An in-depth error analysis. arXiv preprint arXiv:1809.07572 (2018).
[35]
Niklas von Boguszewski, Sana Moin, Anirban Bhowmick, Seid Muhie Yimam, and Chris Biemann. 2021. How hateful are movies? A study and prediction on movie subtitles. arXiv preprint arXiv:2108.10724 (2021).
[36]
Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. 88–93.
[37]
Z. Ziqi, D. Robinson, and T. Jonathan. 2019. Hate speech detection using a convolution-LSTM based deep neural network. IJCCS 11816 (2019), 2546–2553.
[38]
Lara Quijano-Sanchez, Juan Carlos Pereira Kohatsu, Federico Liberatore, and Miguel Camacho-Collados. 2019. HaterNet a system for detecting and analyzing hate speech in Twitter (Version 1.0)[Data set].
[39]
Muhammad Okky Ibrohim and Indra Budi. 2019. Multi-label hate speech and abusive language detection in Indonesian Twitter. Proceedings of the 3rd Workshop on Abusive Language Online. Association for Computational Linguistics, 46–57.

Cited By

View all
  • (2025)Devanagari Character Recognition: A Comprehensive Literature ReviewIEEE Access10.1109/ACCESS.2024.352024813(1249-1284)Online publication date: 2025
  • (2024)Artificial Intelligence inspired method for cross-lingual cyberhate detection from low resource languagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/367717623:9(1-23)Online publication date: 11-Jul-2024

Index Terms

  1. Negative Stances Detection from Multilingual Data Streams in Low-Resource Languages on Social Media Using BERT and CNN-Based Transfer Learning Model

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 1
          January 2024
          385 pages
          EISSN:2375-4702
          DOI:10.1145/3613498
          Issue’s Table of Contents

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 15 January 2024
          Online AM: 30 September 2023
          Accepted: 20 September 2023
          Revised: 07 July 2023
          Received: 25 September 2022
          Published in TALLIP Volume 23, Issue 1

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Bidirectional encoder representations from Transformers (BERT)
          2. convolutional neural network (CNN)
          3. deep transfer learning
          4. low-resource languages
          5. negative stances detection
          6. machine learning classifier

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)259
          • Downloads (Last 6 weeks)19
          Reflects downloads up to 02 Mar 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2025)Devanagari Character Recognition: A Comprehensive Literature ReviewIEEE Access10.1109/ACCESS.2024.352024813(1249-1284)Online publication date: 2025
          • (2024)Artificial Intelligence inspired method for cross-lingual cyberhate detection from low resource languagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/367717623:9(1-23)Online publication date: 11-Jul-2024

          View Options

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          Full Text

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media