Abstract
The work in this paper presents a MTS from Sanskrit to English language using a hybridized form of direct and rule-based machine translation technique. This paper also discusses the language divergence among Sanskrit and English languages with a recommended solution to handle the divergence. The proposed system has used two bilingual dictionaries (Sanskrit–English, Sanskrit–UNL), a tagged Sanskrit corpus, a Sanskrit analysis rule base and an ELGR base. Elasticsearch technique has enhanced the translation speed of the proposed system for accessing the data from different data dictionaries and rule bases used for the system development. The system uses CFG in CNF for Sanskrit language processing and CYK parsing technique for processing the input Sanskrit sentence. This work also presents a novel algorithm which creates a parse tree from the parsing table. ELGR base and bilingual dictionaries generate the target language sentence. The proposed system is evaluated using natural language toolkit API in python and achieved a BLEU score of 0.7606, fluency score of 3.63 and adequacy score of 3.72. A comparison of the proposed system with state-of-the-art systems shows that the proposed system outperforms existing systems.





Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Abbreviations
- AI:
-
Artificial intelligence
- API:
-
Application programming interface
- BLEU:
-
Bilingual evaluation understudy
- CBMT:
-
Corpus-based machine translation
- CFG:
-
Context-free grammar
- CLR:
-
Canonical syntactic realization
- CNF:
-
Chomsky normal form
- CYK:
-
Cocke–Younger–Kasami
- DMT:
-
Direct machine translation
- GLR:
-
Generalized linking routine
- HBMT:
-
Hybrid-based machine translation
- LCS:
-
Lexical conceptual structure
- MT:
-
Machine translation
- MTS:
-
Machine translation system
- POS:
-
Part of speech
- RBMT:
-
Rule-based machine translation
- TLGR:
-
Target language generation rule
- UNL:
-
Universal networking language
References
Kak SC (1987) The paninian approach to natural language processing. Int J Approx Reason 1(1):117–130
Briggs R (1985) Knowledge representation in Sanskrit and artificial intelligence. AI Mag 6(1):32
Bahadur P, Jain A, Chauhan DS (2011) English to Sanskrit machine translation. In: Proceedings of the international conference & workshop on emerging trends in technology. ACM, pp 641–645
Mishra V, Mishra RB (2008) Study of example based English to Sanskrit machine translation. J Res Dev Comput Sci Eng 37:43–54
Mishra V, Mishra RB (2009) Ann and rule based model for English to Sanskrit machine translation. INFOCOMP J Comput Sci 9(1):80–89
Bahadur P, Jain AK, Chauhan DS (2012) Etrans-A complete framework for English to Sanskrit machine translation. In: International Journal of Advanced Computer Science and Applications (IJACSA) from international conference and workshop on emerging trends in technology. Citeseer, pp 52–59
Lewis MP, Simons GF, Fennig CD (2015) Ethnologue: languages of Ecuador. SIL International, Dallas
Mallikarjun B (2010) Patterns of Indian multilingualism. In: Strength for today and bright hope for tomorrow, vol 10, no 6, pp 1–18
Dorr BJ , Hovy EH, Levin LS (2004) Natural language processing and machine translation encyclopedia of language and linguistics, (ELL2). Machine translation: interlingual methods. In: Proceeding international conference of the world congress on engineering
Dorr Bonnie J (1994) Machine translation divergences: a formal description and proposed solution. Comput Linguist 20(4):597–633
Goyal P, Sinha RMK (2009) Translation divergence in English–Sanskrit–Hindi language pairs. In: International sanskrit computational linguistics symposium. Springer, pp 134–143
Mishra V, Mishra RB (2009) Divergence patterns between English and Sanskrit machine translation. INFOCOMP 8(3):62–71
Goyal V, Lehal GS (2010) Web based Hindi to Punjabi machine translation system. J Emerg Technol Web Intell 2(2):148–151
Dubey P et al (2013) Machine translation system for Hindi–Dogri language pair. In: 2013 international conference on machine intelligence and research advancement (ICMIRA). IEEE, pp 422–425
Dubey P (2019) The Hindi to Dogri machine translation system: grammatical perspective. Int J Inf Technol 11(1):171–182
Narayana VN (1994) Anusarak: a device to overcome the language barrier. PhD thesis, Ph.D. thesis, Department of CsE, IIT Kanpur
Bharati A, Chaitanya V, Kulkarni AP, Sangal R (1997) Anusaaraka machine translation in stages. VIVEK-Bombay 10:22–25
Bharati RM, Sankar B, Reddy P, Sharma DM, Sangal R (2003) Machine translation: the shakti approach. Pre-conference tutorial. In: ICON
Josan GS, Lehal GS (2008) A Punjabi to Hindi machine translation system. In: 22nd international conference on on computational linguistics: demonstration papers. Association for Computational Linguistics, pp 157–160
Rajan R, Sivan R, Ravindran R, Soman KP (2009) Rule based machine translation from English to Malayalam. In: ACT’09. International conference on advances in computing, control, & telecommunication technologies, 2009. IEEE, pp 439–441
Goyal P, Sinha RMK (2009) A study towards design of an English to Sanskrit machine translation system. In: Sanskrit computational linguistics. Springer, pp 287–305
Pathak GR, Godse SP (2010) English to Sanskrit machine translation using transfer approach. In: International conference on methods and models in science and technology. American Institute of Physics, Pune, pp 122–126
Mishra V, Mishra RB (2012) English to Sanskrit machine translation system: a rule-based approach. Int J Adv Intell Paradig 4(2):168–184
Reddy MV, Hanumanthappa M (2013) Indic language machine translation tool: English to Kannada/Telugu. In: Multimedia processing, communication and computing applications. Springer, New Delhi, pp 35–49. https://doi.org/10.1007/978-81-322-1143-3_4
Jayan V, Bhadran VK (2014) Anglabharati to Anglamalayalam: an experience with English to Indian language machine translation. In: 2014 international conference on contemporary computing and informatics (IC3I). IEEE, pp 282–287
Desai P, Sangodkar A, Damani OP (2014) A domain-restricted, rule based, English–Hindi machine translation system based on dependency parsing. In: Proceedings of the 11th international conference on natural language processing, pp 177–185
Balyan R, Chatterjee N (2015) Translating noun compounds using semantic relations. Comput Speech Lang 32(1):91–108
Aasha VC, Ganesh A (2015) Machine translation from English to Malayalam using transfer approach. In: 2015 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 1565–1570
Sridhar R, Sethuraman P, Krishnakumar K (2016) English to Tamil machine translation system using universal networking language. Sādhanā 41(6):607–620
Sinha R, sivaraman KS, Agrawal A, Jain R, Srivastava R, Jain A et al (1995) Anglabharti: a multilingual machine aided translation project on translation from English to Indian languages. In: IEEE international conference on systems, man and cybernetics, 1995. Intelligent systems for the 21st century, vol 2. IEEE, pp 1609–1614
Darbari H (1999) Computer-assisted translation system—an Indian perspective. In: Machine translation summit VII, 13th–17th September, pp 80–85
Dave S, Parikh J, Bhattacharyya P (2001) Interlingua-based English-Hindi machine translation and language divergence. Mach Transl 16(4):251–304
Singh S, Dalal M, Vachani V, Bhattacharyya P, Damani OP (2007) Hindi generation from interlingua. In: Proceedings of machine translation summit, pp 1–8
Choudhary A, Singh M (2009) Gb theory based Hindi to English translation system. In: 2nd IEEE international conference on computer science and information technology, 2009. ICCSIT 2009. IEEE, pp 293–297
Christopher M, Rao UM (2010) IL-ILMT sampark: a hybrid machine translation system. In 32nd all India conference of linguistics (AICL32). Lucknow University, Lucknow, pp 69–75
Batra KK, Lehal GS (2010) Rule based machine translation of noun phrases from Punjabi to English. Int J Comput Sci Issues 7(5):409–413
Batra KK, Lehal GS (2011) Automatic translation system from Punjabi to English for simple sentences in legal domain. Int J Trans 23(1):79–98
Kumar P, Sharma RK (2012) Punjabi to unl enconversion system. Sadhana 37(2):299–318
Parteek Kumar and Rajendra Kumar Sharma (2013) Punjabi deconverter for generating Punjabi from universal networking language. J Zhejiang Univ Sci C 14(3):179–196
Udupa UR, Faruquie TA (2005) An English–Hindi statistical machine translation system. In: Su KY, Tsujii J, Lee JH, Kwong OY (eds) Natural language pocessing–IJCNLP 2004. IJCNLP 2004. Lecture notes in computer science, vol 3248. Springer, Berlin, Heidelberg, pp 254–262. https://doi.org/10.1007/978-3-540-30211-7_27
Antony PJ (2013) Machine translation approaches and survey for Indian languages. Int J Comput Linguist Chin Lang Process 18(1):47–78
Garje GV, Kharate GK (2013) Survey of machine translation systems in India. Int J Nat Lang Comput (IJNLC) 2(4):47–67
Sinha RMK (2004) An engineering perspective of machine translation: anglabharti-ii and anubharti-ii architectures. In: Proceedings of international symposium on machine translation, NLP and translation support system (iSTRANS-2004), pp 10–17
Jain R Sinha RMK, Jain A (2001) Anubharti-using hybrid example-based approach for machine translation. In: STRANS-2001, IIT Kanpur, pp 20–32
Sinha RMK, Thakur A (2005) Machine translation of bi-lingual Hindi–English (Hinglish) text. In: 10th Machine translation summit (MT Summit X), Phuket, Thailand, pp 149–156
Sachdeva K, Srivastava R, Jain S, Sharma DM (2014) Hindi to English machine translation: using effective selection in multi-model SMT. In: LREC, pp 1807–1811
Dungarwal P, Chatterjee R, Mishra A, Kunchukuttan A, Shah R, Bhattacharyya P (2014) The IIT bombay Hindi–English translation system at WMT 2014. In: ACL 2014, p 90
Och FJ (2007) Google translator. In: Joint conference on empirical methods in natural language processing and computational natural language learning. Prague. Association for Computational Linguistics, pp 858–867
Venkatapathy S, Bangalore S (2009) Discriminative machine translation using global lexical selection. ACM Trans Asian Lang Inf Process (TALIP) 8(2):8
Sharma N (2011) English to Hindi statistical machine translation system. PhD thesis, Thapar University Patiala
Khan N, Anwar W, Bajwa UI, Durrani N (2013) English to Urdu hierarchical phrase-based statistical machine translation. In: WSSANLP2013, Japan, October 2013, pp 72–76
Ali A, Hussain A, Malik MK (2013) Model for English–Urdu statistical machine translation. World Appl Sci 24:1362–1367
Sheikh M, Conlon S (2013) Application of machine translation in bilingual knowledge management. Int J Intercult Inf Manag 3(2):123–137
Jawaid B, Kamran A, Bojar O (2014) English to Urdu statistical machine translation: establishing a baseline. In: Proceedings of the Fifth workshop on south and southeast Asian natural language processing, pp 37–42
Naskar S, Bandyopadhyay S (2005) Use of machine translation in India: current status. AAMT J 16:25–31
Badodekar S (2003) Translation resources, services and tools for Indian languages. In: Computer science and engineering department, Indian Institute of Technology, Mumbai, 400019
Saini TS, Lehal GS, Kalra VS (2008) Shahmukhi to Gurmukhi transliteration system. In: 22nd international conference on on computational linguistics: demonstration papers. Association for Computational Linguistics, pp 177–180
Goyal V, Lehal GS (2011) Hindi to Punjabi machine translation system. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: systems demonstrations. Association for Computational Linguistics, pp 1–6
Narayan R, Singh VP, Chakraverty S (2014) Quantum neural network based machine translator for Hindi to English. Sci World J 2014:1–8. https://doi.org/10.1155/2014/485737
Sinha RMK, Jain A (2003) Anglahindi: an English to Hindi machine-aided translation system. In: MT Summit IX, New Orleans, USA, pp 494–497
Sinha RMK (2005) Integrating CAT and MT in Anglabharti-II architecture. In: 10th EAMT conference, pp 235–244
Saha GK (2005) The eb-anubad translator: a hybrid scheme. J Zhejiang Univ Sci A 6(10):1047–1050
NCST (2008) Matra: an English to Hindi machine translation system. Technical report, NCST Mumbai
Shahnawaz A, Mishra RB (2011) Translation rules and ann based model for English to Urdu machine translation. INFOCOMP J Comput Sci 10(3):25–35
Shahnawaz, Mishra RB (2015) An English to Urdu translation model based on CBR ANN and translation rules. Int J Adv Intell Paradig 7(1:1–23
Jaideepsinh K, Jatinderkumar S (2016) Sanskrit machine translation systems: a comparative analysis. Int J Comput Appl 136:1–4
Huet G (2006) Shallow syntax analysis in Sanskrit guided by semantic nets constraints. In: Proceedings of the 2006 international workshop on research issues in digital libraries. ACM, p 6
Kulkarni A, Pokar S, Shukl D (2010) Designing a constraint based parser for Sanskrit. In Sanskrit computational linguistics. Springer, pp 70–90
Kulkarni A (2013) A deterministic dependency parser with dynamic programming for Sanskrit. In: Proceedings of the second international conference on dependency linguistics (DepLing 2013), pp 157–166
Bhadra M, Singh SK, Kumar S, Agrawal M, Chandrasekhar R, Mishra SK, Jha GN et al (2009) Sanskrit analysis system (SAS). In: Sanskrit computational linguistics. Springer, pp 116–133
Kumar A, Mittal V, Kulkarni A (2010) Sanskrit compound processor. In: Sanskrit computational linguistics. Springer, pp 57–69
Bharati A, Kulkarni A (2009) Anusaaraka: an accessor cum machine translator. Department of Sanskrit Studies, University of Hyderabad, Hyderabad, pp 1–75
Aparna S (2005) Sanskrit to English translator. In: Language in India, vol 5
Upadhyay P, Jaiswal UC, Ashish K (2014) Transish: translator from Sanskrit to English-a rule based machine translation. Int J Curr Eng Technol 4(5):2277–4106
Gopal M, Mishra D, Singh DP (2010) Evaluating tagsets for Sanskrit. In: International sanskrit computational linguistics symposium. Springer, pp 150–161
Gopal M, Jha GN (2011) Tagging Sanskrit corpus using bis pos tagset. In: International conference on information systems for Indian languages. Springer, pp 191–194
Gopal M, Jha GN (2007) Indian language part of speech tagger (IL-post). http://sanskrit.jnu.ac.in/corpora/tagset.jsp. Accessed 24 Dec 2018
Chandershekhar R, Jha GN (2007) Part-of-speech tagging for Sanskrit. PhD thesis, Special Centre for Sanskrit Studies, JNU Delhi. http://sanskrit.jnu.ac.in/corpora/JNU-Sanskrit-Tagset.htm
Sitender Bawa S (2018) Sansunl: a Sanskrit to UNL enconverter system. IETE J Res. https://doi.org/10.1080/03772063.2018.1528187
Younger DH (1967) Recognition and parsing of context-free languages in time n3. Inf Control 10(2):189–208
Li T, Alagappan D (2006) A comparison of CYK and earley parsing algorithms. In: ICAR-CNR, pp 1–5
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318
LDC (2005) Linguistic data annotation specification: assessment of adequacy and fluency in translations. revision 1.5. Technical report, Linguistic Data Consortium
Kumar P, Sharma RK (2012) UNL based machine translation system for Punjabi language. PhD thesis, Thapar University
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We have no conflicts of interest to disclose.
Human and animal rights
This article does not contain any studies with animals performed by any of the authors. This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Generating parsing table and parse tree using CYK parser
In this section, Table 4 explains the processing of Sanskrit text by CYK parser by taking

as example. Figures 6 and 7 depict the process of parse tree generation from the parsing table.
Appendix 2: Target language generation rule base
This section provides the TLGR and covers three voices of Sanskrit language with corresponding English language equivalent. Table 5 shows tabular representation of three voices and ten tenses of Sanskrit with rules to generate English-equivalent translation.
Appendix 3: Implementation of the proposed Sanskrit-to-English MTS
This section shows the software implementation of the proposed Sanskrit-to-English translator using an example.
Rights and permissions
About this article
Cite this article
Sitender, Bawa, S. A Sanskrit-to-English machine translation using hybridization of direct and rule-based approach. Neural Comput & Applic 33, 2819–2838 (2021). https://doi.org/10.1007/s00521-020-05156-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05156-3