research-article

Deep Ensemble Network for Sentiment Analysis in Bi-lingual Low-resource Languages

Author:

Pradeep Kumar RoyAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 1

Article No.: 8, Pages 1 - 16

https://doi.org/10.1145/3600229

Published: 15 January 2024 Publication History

Abstract

Sentiment analysis (SA) is the systematic identification, extraction, quantification, and study of affective states and subjective information using natural language processing. It is widely used for analyzing users’ feedback, such as reviews or social posts. Recently, SA has been one of the favorite research domains in NLP due to their wide range of applications, including E-commerce, healthcare, hotel business, and others. Many machine learning and deep learning-based models exist to predict the sentiment of the user’s post. However, the sentiment analysis in low-resource languages such as Kannada, Malayalam, Telugu, and Tamil received less attention due to language complexity and the low availability of required resources. This research fills the gap by proposing an ensemble model for predicting the sentiment of code-mixed Kannada and Malayalam languages. The ensemble of transformer-based models achieved a promising weighted F₁-score of 0.66 for Kannada code-mixed language. In contrast, the ensemble model of the deep learning framework performed best by achieving a weighted F₁-score of 0.72 for the Malayalam dataset, outperforming existing research.

References

[1]

S. Anbukkarasi and S. Varadhaganapathy. 2020. SA_SVG@ Dravidian-CodeMix-FIRE2020: Deep learning based sentiment analysis in code-mixed Tamil-English text. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). 591–596.

[2]

Fazlourrahman Balouchzahi and H. L. Shashirekha. 2020. MUCS@ Dravidian-CodeMix-FIRE2020: SACO-SentimentsAnalysis for CodeMix Text. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). 495–502.

[3]

Bharathi Raja Chakravarthi, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, and John Philip McCrae. 2020. A sentiment analysis dataset for code-mixed Malayalam-English. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL). European Language Resources Association, 177–184. Retrieved from https://aclanthology.org/2020.sltu-1.25.

[4]

Bharathi Raja Chakravarthi, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, and John P. McCrae. 2020. A sentiment analysis dataset for code-mixed Malayalam-English. arXiv preprint arXiv:2006.00210 (2020).

[5]

Bharathi Raja Chakravarthi, Prasanna Kumar Kumaresan, Ratnasingam Sakuntharaj, Anand Kumar Madasamy, Sajeetha Thavareesan, Premjith B., Subalalitha Chinnaudayar Navaneethakrishnan, John P. McCrae, and Thomas Mandl. 2021. Overview of the HASOC-DravidianCodeMix shared task on offensive language detection in Tamil and Malayalam. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). CEUR.

[6]

Bharathi Raja Chakravarthi, Vigneshwaran Muralidaran, Ruba Priyadharshini, and John Philip McCrae. 2020. Corpus creation for sentiment analysis in code-mixed Tamil-English text. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL). European Language Resources Association, 202–210. Retrieved from https://aclanthology.org/2020.sltu-1.28.

[7]

Bharathi Raja Chakravarthi, Ruba Priyadharshini, Vigneshwaran Muralidaran, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, and John P. McCrae. 2022. DravidianCodeMix: Sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text. Lang. Resour. Eval. 56, 3 (2022), 765–806.

Digital Library

[8]

Bharathi Raja Chakravarthi, Ruba Priyadharshini, Vigneshwaran Muralidaran, Shardul Suryawanshi, Navya Jose, Elizabeth Sherly, and John P. McCrae. 2020. Overview of the track on sentiment analysis for Dravidian languages in code-mixed text. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). 21–24.

Digital Library

[9]

Bharathi Raja Chakravarthi, Ruba Priyadharshini, Sajeetha Thavareesan, Dhivya Chinnappa, Thenmozhi Durairaj, Elizabeth Sherly, John P. McCrae, Adeep Hande, Rahul Ponnusamy, Shubhanker Banerjee, and Charangan Vasantharajan. 2021. Findings of the sentiment analysis of Dravidian languages in code-mixed text 2021. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). CEUR.

[10]

Ramesh Chundi, Vishwanath R. Hulipalled, and J. B. Simha. 2020. SAEKCS: Sentiment analysis for English - Kannada code switchtext using deep learning techniques. In Proceedings of the International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE’20). 327–331. DOI:.

[11]

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019).

[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[13]

Suman Dowlagar and Radhika Mamidi. 2021. CMSAOne@ Dravidian-CodeMix-fire2020: A meta embedding and transformer model for code-mixed sentiment analysis on social media text. arXiv preprint arXiv:2101.09004 (2021).

[14]

Ronen Feldman. 2013. Techniques and applications for sentiment analysis. Commun. ACM 56, 4 (2013), 82–89.

Digital Library

[15]

Adeep Hande, Ruba Priyadharshini, and Bharathi Raja Chakravarthi. 2020. KanCMD: Kannada CodeMixed dataset for sentiment analysis and offensive language detection. In Proceedings of the 3rd Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media. 54–63.

[16]

Adeep Hande, Karthik Puranik, Konthala Yasaswini, Ruba Priyadharshini, Sajeetha Thavareesan, Anbukkarasi Sampath, Kogilavani Shanmugavadivel, Durairaj Thenmozhi, and Bharathi Raja Chakravarthi. 2021. Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling. arxiv:2108.12177 [cs.CL]

[17]

Dipika Jain, Akshi Kumar, and Rohit Beniwal. 2022. Personality BERT: A transformer-based model for personality detection from textual data. In Proceedings of the International Conference on Computing and Communication Networks. Springer, 515–522.

[18]

Deepak Kumar Jain, Akshi Kumar, and Akshat Shrivastava. 2022. CanarDeep: A hybrid deep neural model with mixed fusion for rumour detection in social data streams. Neural Comput. Applic. 34, 34 (2022), 15129–15140. DOI:

Digital Library

[19]

Adaikkan Kalaivani and Durairaj Thenmozhi. 2020. Multilingual sentiment analysis in Tamil, Malayalam, and Kannada code-mixed social media posts using MBERT. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). Retrieved from http://ceur-ws.org/Vol-3159/T6-16.pdf.

[20]

Akshi Kumar. 2021. Contextual semantics using hierarchical attention network for sentiment classification in social internet-of-things. Multim. Tools Applic. 81, 81 (2021), 36967–36982.

[21]

Akshi Kumar, M. P. S. Bhatia, and Saurabh Raj Sangwan. 2021. Rumour detection using deep learning and filter-wrapper feature selection in benchmark Twitter dataset. Multim. Tools Applic. 81, 81 (2021), 34615–34632.

[22]

Akshi Kumar, Arunima Jaiswal, Shikhar Garg, Shobhit Verma, and Siddhant Kumar. 2022. Sentiment analysis using cuckoo search for optimized feature selection on Kaggle tweets. In Research Anthology on Implementing Sentiment Analysis across Multiple Disciplines. IGI Global, 1203–1218.

[23]

Akshi Kumar and Nitin Sachdeva. 2022. Cyberbullying-mediated depression detection in social media using machine learning. In Proceedings of the 2nd Doctoral Symposium on Computational Intelligence. Springer, 869–877.

[24]

Akshi Kumar, Saurabh Raj Sangwan, Anshika Arora, and Varun G. Menon. 2022. Depress-DCNF: A deep convolutional neuro-fuzzy model for detection of depression episodes using IoMT. Appl. Soft Comput. 122 (2022), 108863.

Digital Library

[25]

Akshi Kumar, Saurabh Raj Sangwan, Adarsh Kumar Singh, and Gandharv Wadhwa. 2022. Hybrid deep learning model for sarcasm detection in Indian indigenous language using word-emoji embeddings. Trans. Asian Low-Resour. Lang. Inf. Process. 22, 22 (2022), 1–20.

[26]

Abhinav Kumar, Sunil Saumya, and Pradeep Roy. 2022. SOA_NLP@ LT-EDI-ACL2022: An ensemble model for hope speech detection from YouTube comments. In Proceedings of the 2nd Workshop on Language Technology for Equality, Diversity and Inclusion. 223–228.

[27]

Abhinav Kumar, Sunil Saumya, and Jyoti Prakash Singh. 2020. NITP-AI-NLP@ Dravidian-CodeMix-FIRE2020: A hybrid CNN and Bi-LSTM network for sentiment analysis of Dravidian code-mixed social media posts. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). 582–590.

[28]

Abhinav Kumar, Sunil Saumya, and Jyoti Prakash Singh. 2021. An ensemble-based model for sentiment analysis of Dravidian code-mixed social media posts. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). CEUR.

[29]

Bing Liu et al. 2010. Sentiment analysis and subjectivity. Handb. Nat. Lang. Process. 2, 2010 (2010), 627–666.

[30]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[31]

Sainik Kumar Mahata, Dipankar Das, and Sivaji Bandyopadhyay. 2020. JUNLP@ Dravidian-CodeMix-FIRE2020: Sentiment classification of code-mixed tweets using bi-directional RNN and language tags. arXiv preprint arXiv:2010.10111 (2020).

[32]

Arun S Maiya. 2020. ktrain: A low-code library for augmented machine learning. arXiv preprint arXiv:2004.10703 (2020).

[33]

Ankit Kumar Mishra, Sunil Saumya, and Abhinav Kumar. 2021. Sentiment analysis of Dravidian-CodeMix language. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). CEUR.

[34]

Abubakr H. Ombabi, Wael Ouarda, and Adel M. Alimi. 2020. Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Soc. Netw. Anal. Mining 10, 1 (2020), 1–13.

[35]

Xi Ouyang, Pan Zhou, Cheng Hua Li, and Lijun Liu. 2015. Sentiment analysis using convolutional neural network. In Proceedings of the IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing. IEEE, 2359–2364.

[36]

P. H. V. Pavan Kumar, B. Premjith, J. P. Sanjanasri, and K. P. Soman. 2021. Deep learning based sentiment analysis for Malayalam, Tamil and Kannada languages. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21), 1–9. DOI:https://ceur-ws.org/Vol-3159/T6-17.pdf

[37]

Ruba Priyadharshini, Bharathi Raja Chakravarthi, Sajeetha Thavareesan, Dhivya Chinnappa, Durairaj Thenmozhi, and Rahul Ponnusamy. 2021. Overview of the DravidianCodeMix 2021 shared task on sentiment detection in Tamil, Malayalam, and Kannada. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). Association for Computing Machinery.

Digital Library

[38]

Pradeep Roy, Snehaan Bhawal, Abhinav Kumar, and Bharathi Raja Chakravarthi. 2022. IIITSurat@ LT-EDI-ACL2022: Hope speech detection using machine learning. In Proceedings of the 2nd Workshop on Language Technology for Equality, Diversity and Inclusion. 120–126.

[39]

Pradeep Kumar Roy. 2021. Deep neural network to predict answer votes on community question answering sites. Neural Process. Lett. 53, 2 (2021), 1633–1646.

Digital Library

[40]

Pradeep Kumar Roy, Snehaan Bhawal, and Chinnaudayar Navaneethakrishnan Subalalitha. 2022. Hate speech and offensive language detection in Dravidian languages using deep ensemble framework. Comput. Speech Lang. 75 (2022), 101386.

Digital Library

[41]

Pradeep Kumar Roy, Asis Kumar Tripathy, Tapan Kumar Das, and Xiao-Zhi Gao. 2020. A framework for hate speech detection using deep convolutional neural network. IEEE Access 8 (2020), 204951–204962.

[42]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).

[43]

Anita Saroj and Sukomal Pal. 2020. IRLab@ IITV@ Dravidian-CodeMix-FIRE2020: Sentiment analysis on multilingual code mixing text using BERT-BASE. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). 597–606.

[44]

Sunil Saumya, Abhinav Kumar, and Jyoti Prakash Singh. 2021. Offensive language identification in Dravidian code mixed social media text. In Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages. 36–45.

[45]

Sunil Saumya and Ankit Kumar Mishra. 2021. IIIT_DWD@ LT-EDI-EACL2021: Hope speech detection in YouTube multilingual comments. In Proceedings of the 1st Workshop on Language Technology for Equality, Diversity and Inclusion. 107–113.

[46]

Huilin Sun, Jiaming Gao, and Fang Sun. 2020. HIT_SUN@ Dravidian-CodeMix-FIRE2020: Sentiment analysis on multilingual code-mixing text base on BERT. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). 517–521.

Cited By

Nazir MFaisal CHabib MAhmad H(2025)Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource LanguagesIEEE Access10.1109/ACCESS.2025.352771013(7538-7554)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3527710
Rehman MRaghuvanshi DPachar HRaghaw CKumar N(2025)Hierarchical Attention-enhanced Contextual CapsuleNet for Multilingual Hope Speech DetectionExpert Systems with Applications10.1016/j.eswa.2024.126285268(126285)Online publication date: Apr-2025
https://doi.org/10.1016/j.eswa.2024.126285
Mahmud TPtaszynski MMasui F(2024)Exhaustive Study into Machine Learning and Deep Learning Methods for Multilingual Cyberbullying Detection in Bangla and Chittagonian TextsElectronics10.3390/electronics1309167713:9(1677)Online publication date: 26-Apr-2024
https://doi.org/10.3390/electronics13091677
Show More Cited By

Index Terms

Deep Ensemble Network for Sentiment Analysis in Bi-lingual Low-resource Languages
1. Computing methodologies
  1. Machine learning

Recommendations

Exploring Multi-lingual, Multi-task, and Adversarial Learning for Low-resource Sentiment Analysis
Deep learning has become most prominent in solving various Natural Language Processing (NLP) tasks including sentiment analysis. However, these techniques require a considerably large amount of annotated corpus, which is not easy to obtain for most of the ...
Sentiment Analysis Using XLM-R Transformer and Zero-shot Transfer Learning on Resource-poor Indian Language
Sentiment analysis on social media relies on comprehending the natural language and using a robust machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. The ...
DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text
Abstract
This paper describes the development of a multilingual, manually annotated dataset for three under-resourced Dravidian languages generated from social media comments. The dataset was annotated for sentiment analysis and offensive language ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 1

January 2024

385 pages

EISSN:2375-4702

DOI:10.1145/3613498

Editor:
Imed Zitoun
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 January 2024

Online AM: 27 May 2023

Accepted: 21 May 2023

Revised: 04 March 2023

Received: 30 September 2022

Published in TALLIP Volume 23, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
562
Total Downloads

Downloads (Last 12 months)271
Downloads (Last 6 weeks)20

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nazir MFaisal CHabib MAhmad H(2025)Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource LanguagesIEEE Access10.1109/ACCESS.2025.352771013(7538-7554)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3527710
Rehman MRaghuvanshi DPachar HRaghaw CKumar N(2025)Hierarchical Attention-enhanced Contextual CapsuleNet for Multilingual Hope Speech DetectionExpert Systems with Applications10.1016/j.eswa.2024.126285268(126285)Online publication date: Apr-2025
https://doi.org/10.1016/j.eswa.2024.126285
Mahmud TPtaszynski MMasui F(2024)Exhaustive Study into Machine Learning and Deep Learning Methods for Multilingual Cyberbullying Detection in Bangla and Chittagonian TextsElectronics10.3390/electronics1309167713:9(1677)Online publication date: 26-Apr-2024
https://doi.org/10.3390/electronics13091677
Ashar MSiahaan D(2024)Text Augmentation to Overcome Data Limitations in Sentiment Analysis for Bahasa Indonesia2024 IEEE International Conference on Data and Software Engineering (ICoDSE)10.1109/ICoDSE63307.2024.10829895(217-222)Online publication date: 30-Oct-2024
https://doi.org/10.1109/ICoDSE63307.2024.10829895
Aliyu YSarlan AUsman Danyaro KRahman AAbdullahi M(2024)Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data SourcesIEEE Access10.1109/ACCESS.2024.339863512(66883-66909)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3398635
Elizabeth MKommineni AHazari R(2024)Sentiment Analysis for Code-Mixed Data Using Cellular Automata with Deep Learning ModelsCellular Automata10.1007/978-3-031-71552-5_14(163-176)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-71552-5_14
Saini JRoy S(2023)Preparation of Rich Lists of Research Gaps in the Specific Sentiment Analysis Tasks of Code-mixed Indian LanguagesSN Computer Science10.1007/s42979-023-02408-65:1Online publication date: 19-Dec-2023
https://dl.acm.org/doi/10.1007/s42979-023-02408-6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents