skip to main content
research-article

Deep Ensemble Network for Sentiment Analysis in Bi-lingual Low-resource Languages

Published: 15 January 2024 Publication History

Abstract

Sentiment analysis (SA) is the systematic identification, extraction, quantification, and study of affective states and subjective information using natural language processing. It is widely used for analyzing users’ feedback, such as reviews or social posts. Recently, SA has been one of the favorite research domains in NLP due to their wide range of applications, including E-commerce, healthcare, hotel business, and others. Many machine learning and deep learning-based models exist to predict the sentiment of the user’s post. However, the sentiment analysis in low-resource languages such as Kannada, Malayalam, Telugu, and Tamil received less attention due to language complexity and the low availability of required resources. This research fills the gap by proposing an ensemble model for predicting the sentiment of code-mixed Kannada and Malayalam languages. The ensemble of transformer-based models achieved a promising weighted F1-score of 0.66 for Kannada code-mixed language. In contrast, the ensemble model of the deep learning framework performed best by achieving a weighted F1-score of 0.72 for the Malayalam dataset, outperforming existing research.

References

[1]
S. Anbukkarasi and S. Varadhaganapathy. 2020. SA_SVG@ Dravidian-CodeMix-FIRE2020: Deep learning based sentiment analysis in code-mixed Tamil-English text. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). 591–596.
[2]
Fazlourrahman Balouchzahi and H. L. Shashirekha. 2020. MUCS@ Dravidian-CodeMix-FIRE2020: SACO-SentimentsAnalysis for CodeMix Text. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). 495–502.
[3]
Bharathi Raja Chakravarthi, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, and John Philip McCrae. 2020. A sentiment analysis dataset for code-mixed Malayalam-English. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL). European Language Resources Association, 177–184. Retrieved from https://aclanthology.org/2020.sltu-1.25.
[4]
Bharathi Raja Chakravarthi, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, and John P. McCrae. 2020. A sentiment analysis dataset for code-mixed Malayalam-English. arXiv preprint arXiv:2006.00210 (2020).
[5]
Bharathi Raja Chakravarthi, Prasanna Kumar Kumaresan, Ratnasingam Sakuntharaj, Anand Kumar Madasamy, Sajeetha Thavareesan, Premjith B., Subalalitha Chinnaudayar Navaneethakrishnan, John P. McCrae, and Thomas Mandl. 2021. Overview of the HASOC-DravidianCodeMix shared task on offensive language detection in Tamil and Malayalam. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). CEUR.
[6]
Bharathi Raja Chakravarthi, Vigneshwaran Muralidaran, Ruba Priyadharshini, and John Philip McCrae. 2020. Corpus creation for sentiment analysis in code-mixed Tamil-English text. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL). European Language Resources Association, 202–210. Retrieved from https://aclanthology.org/2020.sltu-1.28.
[7]
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Vigneshwaran Muralidaran, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, and John P. McCrae. 2022. DravidianCodeMix: Sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text. Lang. Resour. Eval. 56, 3 (2022), 765–806.
[8]
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Vigneshwaran Muralidaran, Shardul Suryawanshi, Navya Jose, Elizabeth Sherly, and John P. McCrae. 2020. Overview of the track on sentiment analysis for Dravidian languages in code-mixed text. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). 21–24.
[9]
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Sajeetha Thavareesan, Dhivya Chinnappa, Thenmozhi Durairaj, Elizabeth Sherly, John P. McCrae, Adeep Hande, Rahul Ponnusamy, Shubhanker Banerjee, and Charangan Vasantharajan. 2021. Findings of the sentiment analysis of Dravidian languages in code-mixed text 2021. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). CEUR.
[10]
Ramesh Chundi, Vishwanath R. Hulipalled, and J. B. Simha. 2020. SAEKCS: Sentiment analysis for English - Kannada code switchtext using deep learning techniques. In Proceedings of the International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE’20). 327–331. DOI:.
[11]
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019).
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[13]
Suman Dowlagar and Radhika Mamidi. 2021. CMSAOne@ Dravidian-CodeMix-fire2020: A meta embedding and transformer model for code-mixed sentiment analysis on social media text. arXiv preprint arXiv:2101.09004 (2021).
[14]
Ronen Feldman. 2013. Techniques and applications for sentiment analysis. Commun. ACM 56, 4 (2013), 82–89.
[15]
Adeep Hande, Ruba Priyadharshini, and Bharathi Raja Chakravarthi. 2020. KanCMD: Kannada CodeMixed dataset for sentiment analysis and offensive language detection. In Proceedings of the 3rd Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media. 54–63.
[16]
Adeep Hande, Karthik Puranik, Konthala Yasaswini, Ruba Priyadharshini, Sajeetha Thavareesan, Anbukkarasi Sampath, Kogilavani Shanmugavadivel, Durairaj Thenmozhi, and Bharathi Raja Chakravarthi. 2021. Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling. arxiv:2108.12177 [cs.CL]
[17]
Dipika Jain, Akshi Kumar, and Rohit Beniwal. 2022. Personality BERT: A transformer-based model for personality detection from textual data. In Proceedings of the International Conference on Computing and Communication Networks. Springer, 515–522.
[18]
Deepak Kumar Jain, Akshi Kumar, and Akshat Shrivastava. 2022. CanarDeep: A hybrid deep neural model with mixed fusion for rumour detection in social data streams. Neural Comput. Applic. 34, 34 (2022), 15129–15140. DOI:
[19]
Adaikkan Kalaivani and Durairaj Thenmozhi. 2020. Multilingual sentiment analysis in Tamil, Malayalam, and Kannada code-mixed social media posts using MBERT. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). Retrieved from http://ceur-ws.org/Vol-3159/T6-16.pdf.
[20]
Akshi Kumar. 2021. Contextual semantics using hierarchical attention network for sentiment classification in social internet-of-things. Multim. Tools Applic. 81, 81 (2021), 36967–36982.
[21]
Akshi Kumar, M. P. S. Bhatia, and Saurabh Raj Sangwan. 2021. Rumour detection using deep learning and filter-wrapper feature selection in benchmark Twitter dataset. Multim. Tools Applic. 81, 81 (2021), 34615–34632.
[22]
Akshi Kumar, Arunima Jaiswal, Shikhar Garg, Shobhit Verma, and Siddhant Kumar. 2022. Sentiment analysis using cuckoo search for optimized feature selection on Kaggle tweets. In Research Anthology on Implementing Sentiment Analysis across Multiple Disciplines. IGI Global, 1203–1218.
[23]
Akshi Kumar and Nitin Sachdeva. 2022. Cyberbullying-mediated depression detection in social media using machine learning. In Proceedings of the 2nd Doctoral Symposium on Computational Intelligence. Springer, 869–877.
[24]
Akshi Kumar, Saurabh Raj Sangwan, Anshika Arora, and Varun G. Menon. 2022. Depress-DCNF: A deep convolutional neuro-fuzzy model for detection of depression episodes using IoMT. Appl. Soft Comput. 122 (2022), 108863.
[25]
Akshi Kumar, Saurabh Raj Sangwan, Adarsh Kumar Singh, and Gandharv Wadhwa. 2022. Hybrid deep learning model for sarcasm detection in Indian indigenous language using word-emoji embeddings. Trans. Asian Low-Resour. Lang. Inf. Process. 22, 22 (2022), 1–20.
[26]
Abhinav Kumar, Sunil Saumya, and Pradeep Roy. 2022. SOA_NLP@ LT-EDI-ACL2022: An ensemble model for hope speech detection from YouTube comments. In Proceedings of the 2nd Workshop on Language Technology for Equality, Diversity and Inclusion. 223–228.
[27]
Abhinav Kumar, Sunil Saumya, and Jyoti Prakash Singh. 2020. NITP-AI-NLP@ Dravidian-CodeMix-FIRE2020: A hybrid CNN and Bi-LSTM network for sentiment analysis of Dravidian code-mixed social media posts. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). 582–590.
[28]
Abhinav Kumar, Sunil Saumya, and Jyoti Prakash Singh. 2021. An ensemble-based model for sentiment analysis of Dravidian code-mixed social media posts. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). CEUR.
[29]
Bing Liu et al. 2010. Sentiment analysis and subjectivity. Handb. Nat. Lang. Process. 2, 2010 (2010), 627–666.
[30]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[31]
Sainik Kumar Mahata, Dipankar Das, and Sivaji Bandyopadhyay. 2020. JUNLP@ Dravidian-CodeMix-FIRE2020: Sentiment classification of code-mixed tweets using bi-directional RNN and language tags. arXiv preprint arXiv:2010.10111 (2020).
[32]
Arun S Maiya. 2020. ktrain: A low-code library for augmented machine learning. arXiv preprint arXiv:2004.10703 (2020).
[33]
Ankit Kumar Mishra, Sunil Saumya, and Abhinav Kumar. 2021. Sentiment analysis of Dravidian-CodeMix language. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). CEUR.
[34]
Abubakr H. Ombabi, Wael Ouarda, and Adel M. Alimi. 2020. Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Soc. Netw. Anal. Mining 10, 1 (2020), 1–13.
[35]
Xi Ouyang, Pan Zhou, Cheng Hua Li, and Lijun Liu. 2015. Sentiment analysis using convolutional neural network. In Proceedings of the IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing. IEEE, 2359–2364.
[36]
P. H. V. Pavan Kumar, B. Premjith, J. P. Sanjanasri, and K. P. Soman. 2021. Deep learning based sentiment analysis for Malayalam, Tamil and Kannada languages. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21), 1–9. DOI:https://ceur-ws.org/Vol-3159/T6-17.pdf
[37]
Ruba Priyadharshini, Bharathi Raja Chakravarthi, Sajeetha Thavareesan, Dhivya Chinnappa, Durairaj Thenmozhi, and Rahul Ponnusamy. 2021. Overview of the DravidianCodeMix 2021 shared task on sentiment detection in Tamil, Malayalam, and Kannada. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’21). Association for Computing Machinery.
[38]
Pradeep Roy, Snehaan Bhawal, Abhinav Kumar, and Bharathi Raja Chakravarthi. 2022. IIITSurat@ LT-EDI-ACL2022: Hope speech detection using machine learning. In Proceedings of the 2nd Workshop on Language Technology for Equality, Diversity and Inclusion. 120–126.
[39]
Pradeep Kumar Roy. 2021. Deep neural network to predict answer votes on community question answering sites. Neural Process. Lett. 53, 2 (2021), 1633–1646.
[40]
Pradeep Kumar Roy, Snehaan Bhawal, and Chinnaudayar Navaneethakrishnan Subalalitha. 2022. Hate speech and offensive language detection in Dravidian languages using deep ensemble framework. Comput. Speech Lang. 75 (2022), 101386.
[41]
Pradeep Kumar Roy, Asis Kumar Tripathy, Tapan Kumar Das, and Xiao-Zhi Gao. 2020. A framework for hate speech detection using deep convolutional neural network. IEEE Access 8 (2020), 204951–204962.
[42]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).
[43]
Anita Saroj and Sukomal Pal. 2020. IRLab@ IITV@ Dravidian-CodeMix-FIRE2020: Sentiment analysis on multilingual code mixing text using BERT-BASE. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). 597–606.
[44]
Sunil Saumya, Abhinav Kumar, and Jyoti Prakash Singh. 2021. Offensive language identification in Dravidian code mixed social media text. In Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages. 36–45.
[45]
Sunil Saumya and Ankit Kumar Mishra. 2021. IIIT_DWD@ LT-EDI-EACL2021: Hope speech detection in YouTube multilingual comments. In Proceedings of the 1st Workshop on Language Technology for Equality, Diversity and Inclusion. 107–113.
[46]
Huilin Sun, Jiaming Gao, and Fang Sun. 2020. HIT_SUN@ Dravidian-CodeMix-FIRE2020: Sentiment analysis on multilingual code-mixing text base on BERT. In Proceedings of the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE’20). 517–521.

Cited By

View all
  • (2025)Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource LanguagesIEEE Access10.1109/ACCESS.2025.352771013(7538-7554)Online publication date: 2025
  • (2025)Hierarchical Attention-enhanced Contextual CapsuleNet for Multilingual Hope Speech DetectionExpert Systems with Applications10.1016/j.eswa.2024.126285268(126285)Online publication date: Apr-2025
  • (2024)Exhaustive Study into Machine Learning and Deep Learning Methods for Multilingual Cyberbullying Detection in Bangla and Chittagonian TextsElectronics10.3390/electronics1309167713:9(1677)Online publication date: 26-Apr-2024
  • Show More Cited By

Index Terms

  1. Deep Ensemble Network for Sentiment Analysis in Bi-lingual Low-resource Languages

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 1
    January 2024
    385 pages
    EISSN:2375-4702
    DOI:10.1145/3613498
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 January 2024
    Online AM: 27 May 2023
    Accepted: 21 May 2023
    Revised: 04 March 2023
    Received: 30 September 2022
    Published in TALLIP Volume 23, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Sentiment analysis
    2. code-mixed
    3. transformer
    4. BERT
    5. Kannada
    6. Malayalam
    7. ensemble learning
    8. deep learning
    9. machine learning

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)271
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource LanguagesIEEE Access10.1109/ACCESS.2025.352771013(7538-7554)Online publication date: 2025
    • (2025)Hierarchical Attention-enhanced Contextual CapsuleNet for Multilingual Hope Speech DetectionExpert Systems with Applications10.1016/j.eswa.2024.126285268(126285)Online publication date: Apr-2025
    • (2024)Exhaustive Study into Machine Learning and Deep Learning Methods for Multilingual Cyberbullying Detection in Bangla and Chittagonian TextsElectronics10.3390/electronics1309167713:9(1677)Online publication date: 26-Apr-2024
    • (2024)Text Augmentation to Overcome Data Limitations in Sentiment Analysis for Bahasa Indonesia2024 IEEE International Conference on Data and Software Engineering (ICoDSE)10.1109/ICoDSE63307.2024.10829895(217-222)Online publication date: 30-Oct-2024
    • (2024)Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data SourcesIEEE Access10.1109/ACCESS.2024.339863512(66883-66909)Online publication date: 2024
    • (2024)Sentiment Analysis for Code-Mixed Data Using Cellular Automata with Deep Learning ModelsCellular Automata10.1007/978-3-031-71552-5_14(163-176)Online publication date: 9-Sep-2024
    • (2023)Preparation of Rich Lists of Research Gaps in the Specific Sentiment Analysis Tasks of Code-mixed Indian LanguagesSN Computer Science10.1007/s42979-023-02408-65:1Online publication date: 19-Dec-2023

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media