Abstract
Part of Speech (POS) tagging for Indian languages like Hindi and Marathi is generally not an investigated territory. Some of the best taggers accessible for Indian dialects utilize crossbreeds of machine learning or stochastic techniques and phonetic information. Available corpuses for Hindi and Marathi are limited. Hence, when Natural Language Processing (NLP) is applied to Hindi and Marathi sentences, desired results are not achieved. Current POS tagging techniques give UNKNOWN (UNK) POS tag for words which are not present in the corpus. This paper proposes how Hidden Markov Model (HMM)-based approach for POS tagging can be extended using Naïve Bayes theorem for prediction of UNK POS tag.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Deshpande, M.M., Gore, S.D.: A hybrid part-of-speech tagger for Marathi sentences. In: 2018 International Conference on Communication information and Computing Technology (ICCICT), Mumbai, pp. 1–10 (2018). https://doi.org/10.1109/iccict.2018.8325898
Mishra, N., Mishra, A.: Part of speech tagging for Hindi Corpus. In: 2011 International Conference on Communication Systems and Network Technologies, Katra, Jammu, pp. 554–558 (2011). https://doi.org/10.1109/csnt.2011.11
Narayan, R., Chakraverty, S., Singh, V.P.: Neural network based parts of speech tagger for Hindi. In: IFAC Proceedings Volumes, vol. 47, no. 1, pp. 519–524 (2014)
Sharma, S.K., Lehal, G.S.: Using Hidden Markov Model to improve the accuracy of Punjabi POS tagger. In: 2011 IEEE International Conference on Computer Science and Automation Engineering, Shanghai, pp. 697–701 (2011). https://doi.org/10.1109/csae.2011.5952600
Singh, J., Joshi, N., Mathur, I.: Development of Marathi part of speech tagger using statistical approach. In: 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Mysore, pp. 1554–1559 (2013). https://doi.org/10.1109/icacci.2013.66374114
Tian, S., Ibrahim, T., Umal, H., Yu, L.: Statistical Uyhur POS tagging with TAG predictor for unknown words. In: 2009 ISECS International Colloquium on Computing, Communication, Control, and Management, Sanya, pp. 60–62 (2009). https://doi.org/10.1109/CCCM.2009.5267823
Yuan, L.: Improvement for the automatic part-of-speech tagging based on hidden Markov model. In: 2010 2nd International Conference on Signal Processing Systems, Dalian, pp. V1-744–V1-747 (2010). https://doi.org/10.1109/icsps.2010.5555259
Bokaei, M.H., Sameti, H., Bahrani, M., Babaali, B.: Segmental HMM-based part-of-speech tagger. In: 2010 International Conference on Audio, Language and Image Processing, Shanghai, pp. 52–56 (2010). https://doi.org/10.1109/icalip.2010.5685018
Ray, P.R., Sudeshna, H.V., Basu, S.A.: Part of Speech Tagging and Local Word Grouping Techniques for Natural Language Parsing in Hindi. This research is funded in part by Media Lab Asia, under the auspices of the Communication Empowerment Laboratory, IIT Kharagpur (2008). oai:CiteSeerX.psu:10.1.1.114.3943
Modi, D., Nain, N.: Part-of-speech tagging of Hindi Corpus using rule-based method. In: Afzalpulkar, N., et al. (eds.) Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing. ©Springer, India (2016). https://doi.org/10.1007/978-81-322-2638-3_28
Patil, H.B., Patil, A.S., Pawar, B.V.: Article: part-of-speech tagger for Marathi language using limited training corpora. In: IJCA Proceedings on National Conference on Recent Advances in Information Technology NCRAIT, no. 4, pp. 33–37 (2014)
Joshi, N., Mathur, I.: HMM based POS tagger for Hindi. In: Zizka, J. (ed.) CCSIT, SIPP, AISC, PDCTA-2013, pp. 341–349. ©CS & IT CSCP (2013). https://doi.org/10.5121/csit.2013.3639
Ekbal, A., Hasanuzzaman, Md., Bandyopadhyay, S.: Voted approach for part of speech tagging in Bengali. In: 23rd Pacific Asia Conference on Language, Information and Computation, pp. 120–129
Acknowledgements
We are thankful to Nikhil Malhotra, Jugnu Manhas, and Saket Apte of Maker’s Lab, Tech Mahindra and Varsha Patil of AISSMS IOIT, Pune for support and help in this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chiplunkar, K., Kharche, M., Chaudhari, T., Shaligram, S., Limkar, S. (2021). Prediction of POS Tagging for Unknown Words for Specific Hindi and Marathi Language. In: Satapathy, S., Zhang, YD., Bhateja, V., Majhi, R. (eds) Intelligent Data Engineering and Analytics. Advances in Intelligent Systems and Computing, vol 1177. Springer, Singapore. https://doi.org/10.1007/978-981-15-5679-1_13
Download citation
DOI: https://doi.org/10.1007/978-981-15-5679-1_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5678-4
Online ISBN: 978-981-15-5679-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)