Prediction of POS Tagging for Unknown Words for Specific Hindi and Marathi Language

Chiplunkar, Kirti; Kharche, Meghna; Chaudhari, Tejaswini; Shaligram, Saurabh; Limkar, Suresh

doi:10.1007/978-981-15-5679-1_13

Kirti Chiplunkar¹⁸,
Meghna Kharche¹⁸,
Tejaswini Chaudhari¹⁸,
Saurabh Shaligram¹⁹ &
…
Suresh Limkar¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1177))

663 Accesses
2 Citations

Abstract

Part of Speech (POS) tagging for Indian languages like Hindi and Marathi is generally not an investigated territory. Some of the best taggers accessible for Indian dialects utilize crossbreeds of machine learning or stochastic techniques and phonetic information. Available corpuses for Hindi and Marathi are limited. Hence, when Natural Language Processing (NLP) is applied to Hindi and Marathi sentences, desired results are not achieved. Current POS tagging techniques give UNKNOWN (UNK) POS tag for words which are not present in the corpus. This paper proposes how Hidden Markov Model (HMM)-based approach for POS tagging can be extended using Naïve Bayes theorem for prediction of UNK POS tag.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Deshpande, M.M., Gore, S.D.: A hybrid part-of-speech tagger for Marathi sentences. In: 2018 International Conference on Communication information and Computing Technology (ICCICT), Mumbai, pp. 1–10 (2018). https://doi.org/10.1109/iccict.2018.8325898
Mishra, N., Mishra, A.: Part of speech tagging for Hindi Corpus. In: 2011 International Conference on Communication Systems and Network Technologies, Katra, Jammu, pp. 554–558 (2011). https://doi.org/10.1109/csnt.2011.11
Narayan, R., Chakraverty, S., Singh, V.P.: Neural network based parts of speech tagger for Hindi. In: IFAC Proceedings Volumes, vol. 47, no. 1, pp. 519–524 (2014)
Google Scholar
http://nltk.org/book
Sharma, S.K., Lehal, G.S.: Using Hidden Markov Model to improve the accuracy of Punjabi POS tagger. In: 2011 IEEE International Conference on Computer Science and Automation Engineering, Shanghai, pp. 697–701 (2011). https://doi.org/10.1109/csae.2011.5952600
Singh, J., Joshi, N., Mathur, I.: Development of Marathi part of speech tagger using statistical approach. In: 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Mysore, pp. 1554–1559 (2013). https://doi.org/10.1109/icacci.2013.66374114
Tian, S., Ibrahim, T., Umal, H., Yu, L.: Statistical Uyhur POS tagging with TAG predictor for unknown words. In: 2009 ISECS International Colloquium on Computing, Communication, Control, and Management, Sanya, pp. 60–62 (2009). https://doi.org/10.1109/CCCM.2009.5267823
Yuan, L.: Improvement for the automatic part-of-speech tagging based on hidden Markov model. In: 2010 2nd International Conference on Signal Processing Systems, Dalian, pp. V1-744–V1-747 (2010). https://doi.org/10.1109/icsps.2010.5555259
Bokaei, M.H., Sameti, H., Bahrani, M., Babaali, B.: Segmental HMM-based part-of-speech tagger. In: 2010 International Conference on Audio, Language and Image Processing, Shanghai, pp. 52–56 (2010). https://doi.org/10.1109/icalip.2010.5685018
Ray, P.R., Sudeshna, H.V., Basu, S.A.: Part of Speech Tagging and Local Word Grouping Techniques for Natural Language Parsing in Hindi. This research is funded in part by Media Lab Asia, under the auspices of the Communication Empowerment Laboratory, IIT Kharagpur (2008). oai:CiteSeerX.psu:10.1.1.114.3943
Google Scholar
Modi, D., Nain, N.: Part-of-speech tagging of Hindi Corpus using rule-based method. In: Afzalpulkar, N., et al. (eds.) Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing. ©Springer, India (2016). https://doi.org/10.1007/978-81-322-2638-3_28
Patil, H.B., Patil, A.S., Pawar, B.V.: Article: part-of-speech tagger for Marathi language using limited training corpora. In: IJCA Proceedings on National Conference on Recent Advances in Information Technology NCRAIT, no. 4, pp. 33–37 (2014)
Google Scholar
Joshi, N., Mathur, I.: HMM based POS tagger for Hindi. In: Zizka, J. (ed.) CCSIT, SIPP, AISC, PDCTA-2013, pp. 341–349. ©CS & IT CSCP (2013). https://doi.org/10.5121/csit.2013.3639
Ekbal, A., Hasanuzzaman, Md., Bandyopadhyay, S.: Voted approach for part of speech tagging in Bengali. In: 23rd Pacific Asia Conference on Language, Information and Computation, pp. 120–129
Google Scholar

Download references

Acknowledgements

We are thankful to Nikhil Malhotra, Jugnu Manhas, and Saket Apte of Maker’s Lab, Tech Mahindra and Varsha Patil of AISSMS IOIT, Pune for support and help in this paper.

Author information

Authors and Affiliations

Department of Computer Engineering, AISSMS Institute of Information Technology, Pune, Maharashtra, India
Kirti Chiplunkar, Meghna Kharche, Tejaswini Chaudhari & Suresh Limkar
Makers Lab, Tech Mahindra, Pune, Maharashtra, India
Saurabh Shaligram

Authors

Kirti Chiplunkar
View author publications
You can also search for this author in PubMed Google Scholar
Meghna Kharche
View author publications
You can also search for this author in PubMed Google Scholar
Tejaswini Chaudhari
View author publications
You can also search for this author in PubMed Google Scholar
Saurabh Shaligram
View author publications
You can also search for this author in PubMed Google Scholar
Suresh Limkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kirti Chiplunkar .

Editor information

Editors and Affiliations

School of Computer Engineering, Kalinga Institute Industrial Technology, Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of Informatics, University of Leicester, Leicester, UK
Yu-Dong Zhang
Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges (SRMGPC), Lucknow, Uttar Pradesh, India
Vikrant Bhateja
School of Management, National Institute of Technology Karnataka, Surathkal, Karnataka, India
Ritanjali Majhi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chiplunkar, K., Kharche, M., Chaudhari, T., Shaligram, S., Limkar, S. (2021). Prediction of POS Tagging for Unknown Words for Specific Hindi and Marathi Language. In: Satapathy, S., Zhang, YD., Bhateja, V., Majhi, R. (eds) Intelligent Data Engineering and Analytics. Advances in Intelligent Systems and Computing, vol 1177. Springer, Singapore. https://doi.org/10.1007/978-981-15-5679-1_13

Download citation

DOI: https://doi.org/10.1007/978-981-15-5679-1_13
Published: 30 August 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5678-4
Online ISBN: 978-981-15-5679-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics