skip to main content
10.1145/3647444.3647879acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicimmiConference Proceedingsconference-collections
research-article

A Comprehensive Guide to Natural Language Processing in Sanskrit with Named Entity Recognition

Published: 13 May 2024 Publication History

Abstract

Named Entity Recognition (NER) is a technique for recognizing and categorizing certain entities from a file or image, such as names, locations, organizations, numbers, and others. Named Entity Recognition (NER) is used in Natural Language Processing (NLP) to ease text extraction. This method is widely used in automated text processing in a wide range of enterprises, as well as in academic research, artificial intelligence, robotics, and bioinformatics.Previously, named entity recognition (NER) research focused on handwritten rules; however, machine learning models such as Hidden Markov Model (HMM), Maximum Entropy (MaxEnt), Maximum Entropy Markov model (MEMM), Support Vector Machine (SVM), Conditional Random Fields (CRFs) are now used to develop NER systems. The goal of concentrating on Sanskrit is to establish its applicability, examine its structure, and utilize accurate Named Entity Recognition methods. This study presents a breakdown of several strategies used for Named Entity Recognition (NER) in Sanskrit, such as rule-based methods, machine learning methods, and hybrid methods that incorporate both approaches. The study explores and contrasts the issues of named entity recognition (NER) for the Sanskrit language using conventional evaluation measures such as accuracy, precision, recall, and F-measure.

References

[1]
Gayen, V., & Sarkar, K. (2014). An HMM based named entity recognition system for indian languages: the JU system at ICON 2013. arXiv preprint arXiv:1405.7397.
[2]
Grishman, Beth Sundheim. 1996. Message Understanding Conference-6: “A Brief History”. In the proceedings of the 16th International Conference on Computational Linguistics (COLING), pages 466-471, Center for Sprogteknologi, Copenhagen, Denmark
[3]
Bathulapalli, Chandana, Drumil Desai, and Manasi Kanhere. "Use of Sanskrit for natural language processing." (2016): 78-81.
[4]
Glida, Dixit, and Narote “General Structure of Machine Translation System” Journal of Emerging Technologies and Innovative Research (JETIR) (May 2019)
[5]
Inderjeet “An Approch to Sanskrit as Computational and Natral Language Processing” (sept 2015)
[6]
Mishra, Vimal, and R. B. Mishra. "Study of example based English to Sanskrit machine translation." Polibits 37 (2008): 43-54.
[7]
Kaur, Yavrajdeep, and Er Rishamjot Kaur.”Named Entity Recogni-tion (NER) system for Hindi language using combination of rule based approach and list look up approach.”International Journal of scientific research and management 3.3 (2015).
[8]
James, J.M. & Jamal, Safa. (2018). Named entity recognition in sanskrit: A survey. International Journal of Pure and Applied Mathematics. 119. 13043-13051.
[9]
John McGonagle, Ayush Rai, and Eli Ross, “Hidden Markov Models”,[Online]. Available: https://brilliant.org/wiki/hidden-markov-models/. [Accessed Feb. 10, 2018]
[10]
Saha, S.K., Ghosh, P., Sarkar, S., & Mitra, P. (2008). Named Entity Recognition in Hindi using Maximum Entropy and Transliteration. Polibits, 38, 33-41.
[11]
John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML ’01), Carla E. Brodley and Andrea Pohoreckyj Danyluk (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 282-289
[12]
Daljit Kaur and Ashish Verma, Survey on Name Entity Recognition Used Machine Learning Algorithm, (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (4), 2014, 5875- 5879.
[13]
Hirpassa, Sintayehu & Lehal, G. (2020). Named entity recognition: a semi-supervised learning approach. International Journal of Information Technology. 13. 10.1007/s41870-020-00470-4.
[14]
S. S. Balgasem 2018 Proc. 2017 6th Int. Conf. Electr. Eng. InformaticsSustain. Soc. Through Digit. Innov. 1.
[15]
J. P. C. Pirovani 2015 Adv. Intell. Syst. Comput. 4452.
[16]
Syafiq, M. I., Talib, M. S., Salim, N., Haron, H., & Alwee, R. (2019, August). A concise review of named entity recognition system: Methods and features. In IOP Conference Series: Materials Science and Engineering (Vol. 551, No. 1, p. 012052). IOP Publishing.
[17]
N., Dikshan & Bhadka, Harshad. (2017). A Survey on Various Approach used in Named Entity Recognition for Indian Languages. International Journal of Computer Applications. 167. 11-18. 10.5120/ijca2017913878.
[18]
Kamaldeep Kaur, Vishal Gupta [2012], “Name Entity Recognition for Punjabi Language”, IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN: 2249-9555, Vol. 2, No.3
[19]
Sinha, Navneet and Gowri Srinivasa. “Hindi-English Language Identification, Named Entity Recognition and Back Transliteration : Shared Task System Description.” (2014).
[20]
Sudha Morwal, Nusrat Jahan [2013], “Named Entity Recognition Using Hidden Markov Model (HMM): An Experimental Result on Hindi, Urdu and Marathi Languages”, International Journal of Advanced Research in Computer Science and Software Engineering 3(4), pp. 671-675
[21]
Sujan Kumar Saha; Sanjay Chatterji; Sandipan Dandapat; Sudeshna Sarkar; Pabitra Mitra. “A Hybrid Approach for Named Entity Recognition in Indian Languages”, In Proceedings of IJCNLP-08 workshop IIIT Hyderabad, India, January 2008, pp. 17-24.
[22]
Vinay Singh, Deepanshu Vijay, Syed Sarfaraz Akhtar, and Manish Shrivastava. 2018. Named Entity Recognition for Hindi-English Code-Mixed Social Media Text. In Proceedings of the Seventh Named Entities Workshop, pages 27–35, Melbourne, Australia. Association for Computational Linguistics.
[23]
S. Srivastava, M. Sanglikar & D. Kothari, (2011) “Named Entity Recognition system for Hindi Language: A Hybrid Approach”, International Journal of Computational Linguistics, Vol. 2
[24]
Jain, Dr & Yadav, Divakar & Tayal, Devendra & Arora, Anuja. (2022). Named-Entity Recognition for Hindi language using context pattern-based maximum entropy. Computer Science. 23. 10.7494/csci.2022.23.1.3977.
[25]
Saha S.K., Narayan S., Sarkar S., Mitra P.: A composite kernel for named entity recognition, Pattern Recognition Letters, vol. 31(12), pp. 1591–1597, 2010.
[26]
Gayen V., Sarkar K.: An HMM based named entity recognition system for indian languages: the JU system at ICON 2013, arXiv preprint arXiv:14057397, 2014.
[27]
Patil, N., Patil, A., & Pawar, B. V. (2020). Named Entity Recognition using Conditional Random Fields. Procedia Computer Science, 167, 1181–1188. 
[28]
Patil, N. V., Patil, A. S., & Pawar, B. V. (2017). HMM based Named Entity Recognition for inflectional language. 2017 International Conference on Computer, Communications and Electronics (Comptelix). 
[29]
Patawar, M., & Potey, M.A. (2016). Named Entity Recognition from Indian tweets using Conditional Random Fields based Approach.
[30]
Malarkodi, C. S., and Sobha Lalitha Devi. "A deeper study on features for named entity recognition." Proceedings of the WILDRE5–5th Workshop on Indian Language Data: Resources and Evaluation. 2020.
[31]
Garg, V., Saraf, N., & Majumder, P. (2013). Named Entity Recognition for Gujarati: A CRF Based Approach. Lecture Notes in Computer Science, 761–768. 
[32]
Shah, D. N., & Bhadka, H. B. (2018). Named Entity Recognition from Gujarati Text Using Rule-Based Approach. Intelligent Systems Design and Applications, 797–805. 
[33]
Komil B. Vora, Dr. Avani R. Vasant, Dr Saurabh Shah. (2022). CUSTOM NAMED ENTITY RECOGNITION FOR GUJRATI TEXT USING HIDDEN MARKOV MODEL. Journal of East China University of Science and Technology, 65(4), 550–558. Retrieved from http://hdlgdxxb.info/index.php/JE_CUST/article/view/467
[34]
Lakshmi G., J. R. Panicker and Meera M, "Named entity recognition in Malayalam using fuzzy support vector machine," 2016 International Conference on Information Science (ICIS), Kochi, India, 2016, pp. 201-206.
[35]
P. Ajees and S. M. Idicula, "A Named Entity Recognition System for Malayalam Using Conditional Random Fields," 2018 International Conference on Data Science and Engineering (ICDSE), Kochi, India, 2018, pp. 1-5.
[36]
Jayan, Jisha P., R. R. Rajeev, and Elizabeth Sherly. "A hybrid statistical approach for named entity recognition for malayalam language." Proceedings of the 11th Workshop on Asian Language Resources. 2013.
[37]
Prasad, Gowri, "Named Entity Recognition for Malayalam language: A CRF based approach." 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM). IEEE, 2015.
[38]
Bhuvaneshwari C Melinamath, “Named Entity Recognition using Conditional Random Field for Kannada Language”,International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8, Issue-11S2, September 2019.
[39]
Pallavi, K. P., Sobha, L. & Ramya, M. M. (2018). Named Entity Recognition for Kannada using Gazetteers list with Conditional Random Fields. Journal of Computer Science, 14(5), 645-653. https://doi.org/10.3844/jcssp.2018.645.653
[40]
Pushpalatha, Mullur and Antony Selvadoss Thanamani. “RULE BASED KANNADA NAMED ENTITY RECOGNITION.” Journal of critical reviews (2020): n. pag.
[41]
Amarappa, S. & Sathyanarayana, S. (2015). Kannada Named Entity Recognition and Classification (NERC) Based on Multinomial Naïve Bayes (MNB) Classifier. International Journal on Natural Language Computing. 4. 10.5121/ijnlc.2015.4404.
[42]
S Amarappa, S V Sathyanarayana; Kannada Named Entity Recognition and Classification using Support Vector Machine, Transactions on Machine Learning and Artificial Intelligence, Volume 5 No 1 February, (2017); pp: 43-63
[43]
Amarappa, S., and S. V. Sathyanarayana. "Named entity recognition and classification in kannada language." International Journal of Electronics and Computer Science Engineering 2.1 (2013): 281-289.
[44]
S. Bam & T. B. Shahi, (2014) “Named Entity Recognition for Nepali text using Support Vector Machine”, Intelligent Information Management, pp21-29.
[45]
[Gopal]Kravets, A. G., Groumpos, P. P., Shcherbakov, M., & Kultsova, M. (Eds.). (2019). Creativity in Intelligent Technologies and Data Science. Communications in Computer and Information Science. 
[46]
Arindam Dey, Abhijit Paul, and Bipul Syam Purkayastha. 2014. Named entity recognition for nepali language: A semi hybrid approach. International Journal of Engineering and Innovative Technology (IJEIT) Volume, 3:21–25.
[47]
Drovo, M. D., Chowdhury, M., Uday, S. I., & Das, A. K. (2019). Named Entity Recognition in Bengali Text Using Merged Hidden Markov Model and Rule Base Approach. 2019 7th International Conference on Smart Computing& Communications (ICSCC). 
[48]
Ekbal, Asif “Named Entity Recognition in Bengali: A Conditional Random Field Approach.” International Joint Conference on Natural Language Processing (2008).
[49]
Das, S. K., & Dhar, S. (2015). Entity Recognition in Bengali language. 2015 International Symposium on Advanced Computing and Communication (ISACC). 
[50]
Parvez, Shamima. (2017). Named Entity Recognition from Bengali Newspaper Data. International Journal on Natural Language Computing. 6. 47-56. 10.5121/ijnlc.2017.6304.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICIMMI '23: Proceedings of the 5th International Conference on Information Management & Machine Intelligence
November 2023
1215 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hidden Markov Models (HMM)
  2. Maximum Entropy
  3. Named Entity Recognition (NER)
  4. Natural Language Processing
  5. Sanskrit
  6. Support Vector Machine

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICIMMI 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 52
    Total Downloads
  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)6
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media