Zero-shot learning based cross-lingual sentiment analysis for sanskrit text with insufficient labeled data

Kumar, Puneet; Pathania, Kshitij; Raman, Balasubramanian

doi:10.1007/s10489-022-04046-6

Zero-shot learning based cross-lingual sentiment analysis for sanskrit text with insufficient labeled data

Published: 15 August 2022

Volume 53, pages 10096–10113, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1047 Accesses
8 Citations
23 Altmetric
2 Mentions
Explore all metrics

Abstract

In this paper, a novel method for analyzing the sentiments portrayed by Sanskrit text has been proposed. Sanskrit is one of the world’s most ancient languages; however, natural language processing tasks such as machine translation and sentiment analysis have not been explored for it to the full potential because of the unavailability of sufficient labeled data. We solved this issue using a zero-shot learning-based cross-lingual sentiment analysis (CLSA) approach. The CLSA uses the resources from the source language to enhance the sentiment analysis of the target language having insufficient resources. The proposed work translates the text from Sanskrit, a language with insufficient labeled data, to English, with sufficient labeled data for sentiment analysis using a transformer model. A generative adversarial network-based strategy has been proposed to evaluate the maturity of the translations. Then a bidirectional long short-term memory-based model has been implemented to classify the sentiments using the embeddings obtained through translations. The proposed technique has achieved 87.50% accuracy for machine translation and 92.83% accuracy for sentiment classification. Sanskrit-English translations used in this work have been collected through web scraping techniques. In the absence of the ground-truth sentiment class labels, a strategy for evaluating the sentiment scores of the proposed sentiment analysis model has also been presented. A new dataset of Sanskrit text, along with their English translations and sentiment scores, has been constructed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Article Open access 05 March 2024

Materials Availability

available at https://github.com/MIntelligence-Group/SanskritTSA https://github.com/MIntelligence-Group/SanskritTSA.

Code Availability

available at https://github.com/MIntelligence-Group/SanskritTSA https://github.com/MIntelligence-Group/SanskritTSA.

Notes

References

Abdalla M, Hirst G (2017) Cross-lingual sentiment analysis without (good) translation. In: The 8th international joint conference on natural language processing, pp 506–515
Aldarmaki H, Diab M (2019) Context aware cross-lingual mapping. In: Conference of the north american chapter of the association for computational linguistics, pp 3906–3911
Anderson P, Fernando B, Johnson M, Stephen G (2016) SPICE: Semantic propositional image caption evaluation. In: European conference on computer vision, pp 382–398. Springer
Avadesh M, Goyal N (2018) Optical character recognition for sanskrit using convolution neural networks. In: 13Th IEEE International workshop on document analysis systems (DAS), pp 447–452
Balahur A, Turchi M (2012) Multilingual sentiment analysis using machine translation. In: 3Rd Workshop in computational approaches to subjectivity and sentiment analysis, pp 52–60
Balamurali AR, Joshi A, Bhattacharyya P (2012) Cross-lingual sentiment analysis for indian languages using linked WordNets. In: International conference on computational linguistics, pp 73–82
Barnes J, Klinger R, Walde SS (2018) Projecting embeddings for domain adaptation joint modeling of sentiment analysis in diverse domains. In: The 27th international conference on computational linguistics, pp 818–830
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Article Google Scholar
Chandra R, Kulkarni V (2022) Semantic and sentiment analysis of selected bhagavad gita translations using BERT-based language framework. IEEE Access 10:21291–21315
Article Google Scholar
Chen X, Sun Y, Athiwaratkun B, Cardie C, Weinberger K (2018) Adversarial deep averaging networks for cross-lingual sentiment classification. Trans Assoc Comput Linguist 6:557–570
Article Google Scholar
Chen Z, Shen S, Hu Z, Lu X, Mei Q, Liu X (2019) Emoji powered representation learning for cross-lingual sentiment classification. In: The world wide web conference, pp 251–262
Choi H, Cho K, Bengio Y (2018) Fine grained attention mechanism for neural machine translation. Neurocomputing 284:171–176
Article Google Scholar
Clark WE (2006) The aryabhatiya of aryabhata: An ancient indian work on mathematics and astronomy. Kessinger Publishing,
Costa-Jussa MR (2018) From Feature to paradigm: Deep learning in machine translation. J Artif Intell Res 61:947–974
Article MathSciNet MATH Google Scholar
Rosa de GH, Papa JP (2021) A survey on text generation using generative adversarial networks. Pattern Recogn 119:108098
Article Google Scholar
Devlin J, Chang M-W, Lee K, Kristina T (2019) BERT pre-training Of deep bidirectional transformers for language understanding. In: The conference of the north american chapter of the association for computational linguistics (NAACL): Human language technologies, vol 1. (long and short papers), pp 4171–4186
Ding X, Wang Y, Xu Z, Welch WJ, Wang ZJ (2020) CcGAN continuous conditional generative adversarial networks for image generation. In: International conference on learning representations
Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: 2Nd International conference on human language technology research, pp 138–145
Dong X, Melo GD (2018) Cross-lingual propagation for deep sentiment analysis. In: Association for the advancement of artificial intelligence conference on artificial intelligence
Eriguchi A, Johnson M, Firat O, Kazawa H, Macherey W (2018) Zero-shot cross-lingual classification using multilingual neural machine translation. arXiv:1809.04686, Accessed 31 March 2022
Ezen-Can A (2020) A Comparison of LSTM and BERT for small corpus. arXiv:2009.05451, Accessed 31 March 2022
Fedorchuk M, Lamiroy B (2017) Binary classifier evaluation without ground truth. In: IEEE International conference on advances in pattern recognition, pp 1–6
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89
Article Google Scholar
Fernández AM, Esuli A, Sebastiani F (2016) Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification. J Artif Intell Res 55:131–163
Article MathSciNet MATH Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large scale sentiment classification: A deep learning approach. Int Conf Mach Learn
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition, pp 770–778
Hellwig O, Scarlata S, Ackermann E, Widmer P (2020) The treebank of vedic sanskrit. In: 12Th language resources and evaluation conference, pp 5137–5146
Jain S, Batra S (2015) Cross-lingual sentiment analysis using modified BRAE. In: Conference on empirical methods in natural language processing, pp 159–168
Jebbara S, Cimiano P (2019) Zero-shot cross-lingual opinion target extraction. In: The conference of the north american chapter of the association for computational linguistics
Joshi A, Balamurali AR, Bhattacharyya P et al (2010) A fall-back strategy for sentiment analysis in hindi: A case study 8th international conference on natural language processing
Kumar R, Jha P, Sahula V (2019) An augmented translation technique for low resource language pair sanskrit to hindi translation. In: The 2nd international conference on algorithms, computing and artificial intelligence, pp 377–383
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2020) ALBERT A lite BERT for self-supervised learning of language representations. In: The 8th international conference on learning representations
Läubli S, Sennrich R, Volk M (2018) Has machine translation achieved human parity? A case for document-level evaluation. In: Conference on empirical methods in natural language processing, pp 4791–4796
Lavie A, Denkowski MJ (2009) The METEOR metric for automatic evaluation of machine translation. Mach Transl 23(2-3):105–115
Article Google Scholar
Lin C-Y (2004) ROUGE: a package for automatic evaluation of summaries. In: Book: text summarization branches out, pp 74–81
Loper E, Bird S (2002) NLTK: the natural language toolkit. In: Workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, pp 63–70
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Conference on empirical methods in natural language processing, pp 1412–1421
Meng X, Wei F, Liu X, Zhou M, Xu G, Wang H (2012) Cross-lingual mixture model for sentiment classification. In: 50Th annual meeting of the association for computational linguistics, pp 572–581
Mittal N, Agarwal B, Chouhan G, Bania N, Pareek P (2013) Sentiment sentiment analysis of hindi reviews based on negation and discourse relation. In: 11Th Workshop on asian language resources, pp 45–50
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22 (10):1345–1359
Article Google Scholar
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: A method for automatic evaluation of machine translation. In: 40Th Annual meeting on association for computational linguistics, pp 311– 318
Pelicon A, Pranjić M, Miljković D, Škrlj B, Pollak S (2020) Zero-shot learning for cross-lingual news sentiment classification. Appl Sci 10(17):5993
Article Google Scholar
Pennington J, Socher R, Manning CD (2014) GLOVE: Global vectors for word representation. In: Conference on empirical methods in natural language processing, pp 1532–1543
Ping W, Peng K, Gibiansky A, Arik SO, Kannan A, Narang S, Raiman J, Miller J (2018) Deep Voice 3: Scaling text-to-speech with convolutional sequence learning. In: The 6th international conference on learning representations
Pouransari H, Ghili S (2014) Deep learning for sentiment analysis of movie reviews. Technical report, stanford university, technical report
Prechelt L (1998) Early stopping - but when? in neural networks: tricks of the trade, pp 55–69. Springer
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21:1–67
MathSciNet MATH Google Scholar
Rao G, Huang W, Feng Z, Cong Q (2018) LSTM with sentence representations for document level sentiment classification. Neurocomputing 308:49–57
Article Google Scholar
Rasooli MS, Farra N, Radeva A, Yu T, McKeown K (2018) Cross-lingual sentiment transfer with limited resources. Mach Transl 32(1):143–165
Article Google Scholar
Tyler R (2017) Sentimentr package for r language, https://github.com/trinker/sentimentr. Accessed 31 March 2022
Ruder S, Vuliá I, Søgaard A (2019) A survey of cross-lingual word embedding models. J Artif Intell Res 65:569–631
Article MathSciNet MATH Google Scholar
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on empirical methods in natural language processing, pp 1631– 1642
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In: International conference on artificial neural networks, pp 270–279. Springer
Hindustan Times Rs. 643.84 Crore Spent on Promotion of Sanskrit in 3 Years: Government Data. https://www.hindustantimes.com/india-news/rs-643-84-cr-spent-on-promotion-of-sanskrit-in-3-years-govt-data https://www.hindustantimes.com/india-news/rs-643-84-cr-spent-on-promotion-of-sanskrit-in-3-years-govt-data https://www.hindustantimes.com/india-news/rs-643-84-cr-spent-on-promotion-of-sanskrit-in-3-years-govt-data, 2020. Accessed 31 March 2022
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Vedantam R, Lawrence Zitnick C, Parikh D (2015) CIDEr: Consensus-based image description evaluation. In: IEEE Conference on computer vision and pattern recognition, pp 4566– 4575
Wang D, Jing B, Lu C, Wu J, Liu G, Du C, Zhuang F (2020) Coarse alignment of topic and sentiment: A unified model for cross-lingual sentiment classification. IEEE Trans Neural Netw Learn Syst 32(2):736–747
Article Google Scholar
Wang W, Zheng VW, Yu H, Miao C (2019) A survey of zeroshot learning: Settings, methods, and applications. ACM Trans Intell Syst Technol 10(2):1–37
Google Scholar
Wei B, Pal C (2010) Cross-lingual adaptation: An experiment on sentiment classifications. In: Association of computational linguistics conference, pp 258–262
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2020) Transformers: state-of-the-art natural language processing. In: The conference on empirical methods in natural language processing, pp 38–45
Wujastyk D (2003) The roots of ayurveda: Selections from sanskrit medical writings penguin publication
Xu J, Xu S, Zhang Z, Zhao G, Lin J (2019) Understanding and improving layer normalization. Adv Neural Inf Process Syst, vol 32
Xue L, Constant N, Roberts A, Kale M, Al-Rfou R, Siddhant A, Barua A, Colin R (2021) mT5: A massively multilingual: Pre-trained text-to-text transformer. In: The north american chapter of the association for computational linguistics: Human language technologies, pp 483–498
Yang F, Du M, Hu X (2019) Evaluating explanation without ground truth in interpretable machine learning. arXiv:1907.06831, Accessed 31 March 2022
Zafarani R, Liu H (2015) Evaluation without ground truth in social media research. Commun ACM 58(6):54–60
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Prof. Anil Kumar Gourishetty (Physics Department, Indian Institute of Technology Roorkee) for his valuable suggestions and Prof. Nagendra Kumar (Department of Humanities and Social Sciences, Indian Institute of Technology Roorkee) for thoroughly editing and proofreading the paper’s manuscript. We are also thankful to the editors and reviewers who helped improve the paper’s quality through valuable and constructive review comments.

Funding

This research was supported by Ministry of Human Resource Development (MHRD) INDIA with reference grant number: 1-3146198040.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, India
Puneet Kumar & Balasubramanian Raman
Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, India
Kshitij Pathania

Authors

Puneet Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Kshitij Pathania
View author publications
You can also search for this author in PubMed Google Scholar
Balasubramanian Raman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Puneet Kumar: Methodology, Implementation, Experiments, Result Analysis, Writing - original draft & editing. Kshitij Pathania: Data Curation, Implementation, Conceptualization, Validation, Writing - review. Balasubramanian Raman: Conceptualization, Writing - review, Supervision, Project administration.

Corresponding author

Correspondence to Puneet Kumar.

Ethics declarations

Conflict of Interests

Authors have no conflict of interest.

Consent for Publication

This article does not contain any studies with human participants or animals performed by any of the authors

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kumar, P., Pathania, K. & Raman, B. Zero-shot learning based cross-lingual sentiment analysis for sanskrit text with insufficient labeled data. Appl Intell 53, 10096–10113 (2023). https://doi.org/10.1007/s10489-022-04046-6

Download citation

Accepted: 27 July 2022
Published: 15 August 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-04046-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zero-shot learning based cross-lingual sentiment analysis for sanskrit text with insufficient labeled data

Abstract

Access this article

Similar content being viewed by others

A review on sentiment analysis and emotion detection from text

Sentiment Analysis in the Age of Generative AI

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Materials Availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interests

Consent for Publication

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Zero-shot learning based cross-lingual sentiment analysis for sanskrit text with insufficient labeled data

Abstract

Access this article

Similar content being viewed by others

A review on sentiment analysis and emotion detection from text

Sentiment Analysis in the Age of Generative AI

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Materials Availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interests

Consent for Publication

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation