skip to main content
research-article

Cross-lingual Sentiment Analysis of Tamil Language Using a Multi-stage Deep Learning Architecture

Published: 19 December 2023 Publication History

Abstract

In recent years, sentiment analysis has become a focal point in natural language processing. Cross-lingual sentiment analysis is a particularly demanding yet essential task that seeks to construct models capable of effectively analyzing sentiments across a variety of languages. The primary motivation behind this research is to bridge the gap in current techniques that often struggle to perform well with low-resource languages, due to the scarcity of large, annotated datasets, and their unique linguistic characteristics. In light of these challenges, we propose a novel Multi-Stage Deep Learning Architecture (MSDLA) for cross-lingual sentiment analysis of the Tamil language, a low-resource language. Our approach utilizes transfer learning from a source language with abundant resources to overcome data limitations. Our proposed model significantly outperforms existing methods on the Tamil Movie Review dataset, achieving an accuracy, precision, recall, and F1-score of 0.8772, 0.8614, 0.8825, and 0.8718, respectively. ANOVA statistical comparison demonstrates that the MSDLA’s improvements over other models, including mT5, XLM, mBERT, ULMFiT, BiLSTM, LSTM with Attention, and ALBERT with Hugging Face English Embedding are significant, with p-values all less than 0.005. Ablation studies confirm the importance of both cross-lingual semantic attention and domain adaptation in our architecture. Without these components, the model’s performance drops to 0.8342 and 0.8043 in accuracy, respectively. Furthermore, MSDLA demonstrates robust cross-domain performance on the Tamil News Classification and Thirukkural datasets, achieving an accuracy of 0.8551 and 0.8624, respectively, significantly outperforming the baseline models. These findings illustrate the robustness and efficacy of our approach, making a significant contribution to cross-lingual sentiment analysis techniques, especially for low-resource languages.

References

[1]
Amirhossein Aghamohammadi, Ramin Ranjbarzadeh, Fatemeh Naiemi, Marzieh Mogharrebi, Shadi Dorosti, and Malika Bendechache. 2021. TPCNN: Two-path convolutional neural network for tumor and liver segmentation in CT images using a novel encoding approach. Expert Syst. Applic. 183 (2021), 115406.
[2]
Georgios Alexandridis, Konstantinos Korovesis, Iraklis Varlamis, Panagiotis Tsantilas, and George Caridakis. 2021. Emotion detection on Greek social media using bidirectional encoder representations from transformers. In Proceedings of the 25th Pan-Hellenic Conference on Informatics. 28–32. DOI:
[3]
Naimeh Alipour and Jafar Tahmoresnezhad. 2022. Heterogeneous domain adaptation with statistical distribution alignment and progressive pseudo label selection. Appl. Intell. 52, 7 (5 2022), 8038–8055. DOI:
[4]
Yathrib Alqahtani, Nora Al-Twairesh, and Ahmed Alsanad. 2023. A comparative study of effective domain adaptation approaches for Arabic sentiment classification. Appl. Sci. 13, 3 (2023). DOI:
[5]
K. Amulya, S. B. Swathi, P. Kamakshi, and Y. Bhavani. 2022. Sentiment analysis on IMDB movie reviews using machine learning and deep learning algorithms. In Proceedings of the 4th International Conference on Smart Systems and Inventive Technology (ICSSIT’22), 814–819. DOI:
[6]
S. Anbukkarasi, D. Elangovan, Jayalakshmi Periyasamy, V. E. Sathishkumar, S. Sree Dharinya, M. Sandeep Kumar, and J. Prabhu. 2023. Phonetic-based forward online transliteration tool from English to Tamil language. Int. J. Reliab., Qual. Safet. Eng. 30, 3 (6 2023). DOI:
[7]
Jessica Naraiswari Arwidarasti, Ika Alfina, and Adila Alfa Krisnadhi. 2020. Adjusting indonesian multiword expression annotation to the Penn Treebank format. In Proceedings of the International Conference on Asian Language Processing (IALP’20). 75–80. DOI:
[8]
Anwar Aysa, Mijit Ablimit, Hankiz Yilahun, and Askar Hamdulla. 2022. Chinese-Uyghur bilingual lexicon extraction based on weak supervision. Information 13, 4 (3 2022), 175. DOI:
[9]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, and Sandhini Agarwal. 2020. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33 (2020), 1877–1901.
[10]
Rosario Catelli, Luca Bevilacqua, Nicola Mariniello, Vladimiro Scotto di Carlo, Massimo Magaldi, Hamido Fujita, Giuseppe De Pietro, and Massimo Esposito. 2022. Cross lingual transfer learning for sentiment analysis of Italian TripAdvisor reviews. Expert Syst. Applic. 209 (12 2022), 118246. DOI:
[11]
Jireh Yi-Le Chan, Khean Thye Bea, Steven Mun Hong Leow, Seuk Wai Phoong, and Wai Khuen Cheng. 2023. State of the art: A review of sentiment analysis based on sequential transfer learning. Artif. Intell. Rev. 56, 1 (1 2023), 749–780. DOI:
[12]
Qiang Chen, Chenliang Li, and Wenjie Li. 2017. Modeling language discrepancy for cross-lingual sentiment analysis. In Proceedings of the ACM Conference on Information and Knowledge Management. 117–126. DOI:
[13]
Yong Dai, Jian Liu, Xiancong Ren, and Zenglin Xu. 2020. Adversarial training based multi-source unsupervised domain adaptation for sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence. 7618–7625.
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[15]
Drazen Draskovic, Darinka Zecevic, and Bosko Nikolic. 2022. Development of a multilingual model for machine sentiment analysis in the Serbian language. Mathematics 10, 18 (9 2022), 3236. DOI:
[16]
Siamak Faridani. 2011. Using canonical correlation analysis for generalized sentiment analysis, product recommendation and search. In Proceedings of the 5th ACM Conference on Recommender Systems355–358. DOI:
[17]
Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In Proceedings of the International Conference on Machine Learning. PMLR, 1180–1189.
[18]
Dehong Gao, Furu Wei, Wenjie Li, Xiaohua Liu, and Ming Zhou. 2015. Cross-lingual sentiment lexicon learning with bilingual word graph label propagation. Comput. Ling. 41, 1 (3 2015), 21–40. DOI:
[19]
Andrea Gasparetto, Matteo Marcuzzo, Alessandro Zangari, and Andrea Albarelli. 2022. A survey on text classification algorithms: From text to predictions. Information 13, 2 (2 2022), 83. DOI:
[20]
Alireza Ghorbanali, Mohammad Karim Sohrabi, and Farzin Yaghmaee. 2022. Ensemble transfer learning-based multimodal sentiment analysis using weighted convolutional neural networks. Inf. Process. Manag. 59, 3 (5 2022), 102929. DOI:
[21]
S. Gokila, S. Rajeswari, and S. Deepa. 2023. TAMIL- NLP: Roles and impact of machine learning and deep learning with natural language processing for Tamil. In Proceedings of the 8th International Conference on Science Technology Engineering and Mathematics (ICONSTEM’23). 1–9. DOI:
[22]
Yohan Karunanayake, Uthayasanker Thayasivam, and Surangika Ranathunga. 2019. Transfer learning based free-form speech command classification for low-resource languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 288–294. DOI:
[23]
Jyotsana Khatri, Rudra Murthy, and Pushpak Bhattacharyya. 2020. A study of efficacy of cross-lingual word embeddings for Indian languages. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. 347–348. DOI:
[24]
Seokhwan Kim, Minwoo Jeong, Jonghoon Lee, and Gary Geunbae Lee. 2014. Cross-lingual annotation projection for weakly-supervised relation extraction. ACM Trans. Asian Lang. Inf. Process. 13, 1 (2 2014), 1–26. DOI:
[25]
Joris Knoester, Flavius Frasincar, and Maria Mihaela Truşcǎ. 2022. Domain Adversarial Training for Aspect-Based Sentiment Analysis. Lecture Notes in Computer Science, Vol. 13724, Springer, Cham, 21-37 pages. DOI:
[26]
Akshi Kumar and Victor Hugo C. Albuquerque. 2021. Sentiment analysis using XLM-R transformer and zero-shot transfer learning on resource-poor Indian language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20, 5 (9 2021), 1–13. DOI:
[27]
C. S. Ayush Kumar, Advaith Maharana, Srinath Murali, B. Premjith, and Soman Kp. 2022. BERT-based sequence labelling approach for dependency parsing in Tamil. In Proceedings of the 2nd Workshop on Speech and Language Technologies for Dravidian Languages. 1–8.
[28]
Huan Liang, Wenlong Fu, and Fengji Yi. 2019. A survey of recent advances in transfer learning. In Proceedings of the IEEE 19th International Conference on Communication Technology (ICCT’19). 1516–1523. DOI:
[29]
Devika P. Madalli. 2002. Unicode for multilingual representation in digital libraries from the Indian perspective. In Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries. 398–398. DOI:
[30]
Praveen Mahadevan, Parameswaran Srihari, Krishnathasan Seyon, Parthipan Vasavan, and Rrubaa Panchendrarajan. 2023. Tamil Grammarly—A typing assistant for Tamil language using natural language processing. In Proceedings of the 3rd International Conference on Advanced Research in Computing (ICARC’23). 154–159. DOI:
[31]
Rubika Murugathas and Uthayasanker Thayasivam. 2022. Domain specific named entity recognition in Tamil. In Proceedings of the Moratuwa Engineering Research Conference (MERCon’22). 1–6. DOI:
[32]
El Moatez Billah Nagoudi, AbdelRahim Elmadany, and Muhammad Abdul-Mageed. 2022. AraT5: Text-to-text transformers for Arabic language generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 628–647. DOI:
[33]
Usman Naseem, Imran Razzak, Shah Khalid Khan, and Mukesh Prasad. 2021. A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20, 5 (9 2021), 1–35. DOI:
[34]
Nicole Novielli, Fabio Calefato, Davide Dongiovanni, Daniela Girardi, and Filippo Lanubile. 2020. Can we use SE-specific sentiment analysis tools in a cross-platform setting? In Proceedings of the 17th International Conference on Mining Software Repositories. 158–168. DOI:
[35]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1 (2020), 5485–5551.
[36]
Ramin Ranjbarzadeh, Saeid Jafarzadeh Ghoushchi, Malika Bendechache, Amir Amirabadi, Mohd Nizam Ab Rahman, Soroush Baseri Saadi, Amirhossein Aghamohammadi, and Mersedeh Kooshki Forooshani. 2021. Lung infection segmentation for COVID-19 pneumonia based on a cascade convolutional network from CT images. BioMed Res. Int. 2021 (4 2021), 1–16. DOI:
[37]
Ramin Ranjbarzadeh, Abbas Bagherian Kasgari, Saeid Jafarzadeh Ghoushchi, Shokofeh Anari, Maryam Naseri, and Malika Bendechache. 2021. Brain tumor segmentation based on deep learning and an attention mechanism using MRI multi-modalities brain images. Scient. Rep. 11, 1 (5 2021), 10930. DOI:
[38]
Ramin Ranjbarzadeh, Soroush Sadeghi, Aida Fadaeian, Saeid Jafarzadeh Ghoushchi, Erfan Babaee Tirkolaee, Annalina Caputo, and Malika Bendechache. 2023. ETACM: An encoded-texture active contour model for image segmentation with fuzzy boundaries. Soft Comput. (7 2023). DOI:
[39]
M. Sangeetha and K. Nimala. 2022. Exploration of sentiment analysis techniques on a multilingual dataset dealing with Tamil-English reviews. In Proceedings of the International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI’22). 1–8. DOI:
[40]
Dr. P. A. Selvaraj, Dr. M. Jagadeesan, Dr. M. Harikrishnan, Dr. R. Vijayapriya, and Dr. K. Jayasudha. 2022. Survey on spell checker for Tamil language using natural language processing. J. Pharmaceut. Neg. Results (Oct. 2022), 170–174. Retrieved from https://pnrjournal.com/index.php/home/article/view/1697
[41]
S. Maruvur Selvi and P. S. Sreeja. 2023. Sentimental analysis of movie reviews in Tamil text. In Proceedings of the 7th International Conference on Intelligent Computing and Control Systems (ICICCS’23). IEEE, 1157–1162.
[42]
Erfan Babaee Tirkolaee, Nadi Serhan Aydın, Mehdi Ranjbar-Bourani, and Gerhard-Wilhelm Weber. 2020. A robust bi-objective mathematical model for disaster rescue units allocation and scheduling with learning effect. Comput. Industr. Eng. 149 (11 2020), 106790. DOI:
[43]
Erfan Babaee Tirkolaee, Saeid Sadeghi, Farzaneh Mansoori Mooseloo, Hadi Rezaei Vandchali, and Samira Aeini. 2021. Application of machine learning in supply chain management: A comprehensive overview of the main areas. Math. Prob. Eng. 2021 (6 2021), 1–14. DOI:
[44]
David Vilares, Miguel A. Alonso, and Carlos Gómez-Rodríguez. 2015. Sentiment analysis on monolingual, multilingual and code-switching Twitter corpora. In Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2–8. DOI:
[45]
Chunpei Wang and Xiaowang Zhang. 2020. Q-BERT: A BERT-based framework for computing SPARQL similarity in natural language. In Proceedings of the Web Conference. 65–66. DOI:
[46]
Runchuan Wang, Zhao Zhang, Fuzhen Zhuang, Dehong Gao, Yi Wei, and Qing He. 2021. Adversarial domain adaptation for cross-lingual information retrieval with multilingual BERT. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3498–3502. DOI:
[47]
Xu Wang, Chengda Tang, Xiaotian Zhao, Xuancai Li, Zhuolin Jin, Dequan Zheng, and Tiejun Zhao. 2019. Transfer learning methods for spoken language understanding. In Proceedings of the International Conference on Multimodal Interaction. 510–515. DOI:
[48]
Mayur Wankhade, Annavarapu Chandra Sekhara Rao, and Chaitanya Kulkarni. 2022. A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 55, 7 (10 2022), 5731–5780. DOI:
[49]
Yuemei Xu, Han Cao, Wanze Du, and Wenqing Wang. 2022. A survey of cross-lingual sentiment analysis: Methodologies, models and evaluations. Data Sci. Eng. 7, 3 (9 2022), 279–299. DOI:
[50]
Ting Yao, Yingwei Pan, Chong-Wah Ngo, Houqiang Li, and Tao Mei. 2015. Semi-supervised domain adaptation with subspace learning for visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2142–2150.
[51]
Puxuan Yu, Hongliang Fei, and Ping Li. 2021. Cross-lingual language model pretraining for retrieval. In Proceedings of the Web Conference. 1029–1039. DOI:
[52]
Fatima zahra El-Alami, Said Ouatik El Alaoui, and Noureddine En Nahnahi. 2022. A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model. J. King Saud Univ.-Comput. Inf. Sci. 34, 8 (9 2022), 6048–6056. DOI:
[53]
Bowen Zhang, Xianghua Fu, Chuyao Luo, Yunming Ye, Xutao Li, and Liwen Jing. 2023. Cross-domain aspect-based sentiment classification by exploiting domain-invariant semantic-primary feature. IEEE Trans. Affect. Comput. 14, 4 (2023).
[54]
Shibingfeng Zhang, Shantanu Nath, and Davide Mazzaccara. 2023. GPL at SemEval-2023 Task 1: WordNet and CLIP to disambiguate images. In Proceedings of the the 17th International Workshop on Semantic Evaluation (SemEval’23). 1592–1597.

Cited By

View all
  • (2025)Open challenges and opportunities in federated foundation models towards biomedical healthcareBioData Mining10.1186/s13040-024-00414-918:1Online publication date: 4-Jan-2025
  • (2025)An integrated framework for emotion and sentiment analysis in Tamil and Malayalam visual contentLanguage Resources and Evaluation10.1007/s10579-024-09804-1Online publication date: 5-Jan-2025
  • (2024)Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data SourcesIEEE Access10.1109/ACCESS.2024.339863512(66883-66909)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Cross-lingual Sentiment Analysis of Tamil Language Using a Multi-stage Deep Learning Architecture

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 12
      December 2023
      194 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3638035
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 December 2023
      Online AM: 01 November 2023
      Accepted: 20 October 2023
      Revised: 22 August 2023
      Received: 29 April 2023
      Published in TALLIP Volume 22, Issue 12

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Artificial intelligence
      2. attention mechanism
      3. cross-lingual sentiment analysis
      4. tamil sentiment analysis

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)137
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 15 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Open challenges and opportunities in federated foundation models towards biomedical healthcareBioData Mining10.1186/s13040-024-00414-918:1Online publication date: 4-Jan-2025
      • (2025)An integrated framework for emotion and sentiment analysis in Tamil and Malayalam visual contentLanguage Resources and Evaluation10.1007/s10579-024-09804-1Online publication date: 5-Jan-2025
      • (2024)Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data SourcesIEEE Access10.1109/ACCESS.2024.339863512(66883-66909)Online publication date: 2024
      • (2024)Explainable machine learning models for early gastric cancer diagnosisScientific Reports10.1038/s41598-024-67892-z14:1Online publication date: 29-Jul-2024
      • (2024)A novel socio-pragmatic framework for sentiment analysis in Dravidian–English code-switched textsKnowledge-Based Systems10.1016/j.knosys.2024.112248300:COnline publication date: 18-Nov-2024
      • (2024)OntoXAI: a semantic web rule language approach for explainable artificial intelligenceCluster Computing10.1007/s10586-024-04682-227:10(14951-14975)Online publication date: 7-Aug-2024
      • (2024)Detecting Offensive Language in Tamil YouTube CommentsComputing and Machine Learning10.1007/978-981-97-7571-2_31(407-420)Online publication date: 25-Dec-2024
      • (2024)Comprehensive Analysis on Image Captioning ApproachesComputing and Machine Learning10.1007/978-981-97-7571-2_28(359-371)Online publication date: 25-Dec-2024

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media