Skip to main content

Prediction of POS Tagging for Unknown Words for Specific Hindi and Marathi Language

  • Conference paper
  • First Online:
Intelligent Data Engineering and Analytics

Abstract

Part of Speech (POS) tagging for Indian languages like Hindi and Marathi is generally not an investigated territory. Some of the best taggers accessible for Indian dialects utilize crossbreeds of machine learning or stochastic techniques and phonetic information. Available corpuses for Hindi and Marathi are limited. Hence, when Natural Language Processing (NLP) is applied to Hindi and Marathi sentences, desired results are not achieved. Current POS tagging techniques give UNKNOWN (UNK) POS tag for words which are not present in the corpus. This paper proposes how Hidden Markov Model (HMM)-based approach for POS tagging can be extended using Naïve Bayes theorem for prediction of UNK POS tag.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Deshpande, M.M., Gore, S.D.: A hybrid part-of-speech tagger for Marathi sentences. In: 2018 International Conference on Communication information and Computing Technology (ICCICT), Mumbai, pp. 1–10 (2018). https://doi.org/10.1109/iccict.2018.8325898

  2. Mishra, N., Mishra, A.: Part of speech tagging for Hindi Corpus. In: 2011 International Conference on Communication Systems and Network Technologies, Katra, Jammu, pp. 554–558 (2011). https://doi.org/10.1109/csnt.2011.11

  3. Narayan, R., Chakraverty, S., Singh, V.P.: Neural network based parts of speech tagger for Hindi. In: IFAC Proceedings Volumes, vol. 47, no. 1, pp. 519–524 (2014)

    Google Scholar 

  4. http://nltk.org/book

  5. Sharma, S.K., Lehal, G.S.: Using Hidden Markov Model to improve the accuracy of Punjabi POS tagger. In: 2011 IEEE International Conference on Computer Science and Automation Engineering, Shanghai, pp. 697–701 (2011). https://doi.org/10.1109/csae.2011.5952600

  6. Singh, J., Joshi, N., Mathur, I.: Development of Marathi part of speech tagger using statistical approach. In: 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Mysore, pp. 1554–1559 (2013). https://doi.org/10.1109/icacci.2013.66374114

  7. Tian, S., Ibrahim, T., Umal, H., Yu, L.: Statistical Uyhur POS tagging with TAG predictor for unknown words. In: 2009 ISECS International Colloquium on Computing, Communication, Control, and Management, Sanya, pp. 60–62 (2009). https://doi.org/10.1109/CCCM.2009.5267823

  8. Yuan, L.: Improvement for the automatic part-of-speech tagging based on hidden Markov model. In: 2010 2nd International Conference on Signal Processing Systems, Dalian, pp. V1-744–V1-747 (2010). https://doi.org/10.1109/icsps.2010.5555259

  9. Bokaei, M.H., Sameti, H., Bahrani, M., Babaali, B.: Segmental HMM-based part-of-speech tagger. In: 2010 International Conference on Audio, Language and Image Processing, Shanghai, pp. 52–56 (2010). https://doi.org/10.1109/icalip.2010.5685018

  10. Ray, P.R., Sudeshna, H.V., Basu, S.A.: Part of Speech Tagging and Local Word Grouping Techniques for Natural Language Parsing in Hindi. This research is funded in part by Media Lab Asia, under the auspices of the Communication Empowerment Laboratory, IIT Kharagpur (2008). oai:CiteSeerX.psu:10.1.1.114.3943

    Google Scholar 

  11. Modi, D., Nain, N.: Part-of-speech tagging of Hindi Corpus using rule-based method. In: Afzalpulkar, N., et al. (eds.) Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing. ©Springer, India (2016). https://doi.org/10.1007/978-81-322-2638-3_28

  12. Patil, H.B., Patil, A.S., Pawar, B.V.: Article: part-of-speech tagger for Marathi language using limited training corpora. In: IJCA Proceedings on National Conference on Recent Advances in Information Technology NCRAIT, no. 4, pp. 33–37 (2014)

    Google Scholar 

  13. Joshi, N., Mathur, I.: HMM based POS tagger for Hindi. In: Zizka, J. (ed.) CCSIT, SIPP, AISC, PDCTA-2013, pp. 341–349. ©CS & IT CSCP (2013). https://doi.org/10.5121/csit.2013.3639

  14. Ekbal, A., Hasanuzzaman, Md., Bandyopadhyay, S.: Voted approach for part of speech tagging in Bengali. In: 23rd Pacific Asia Conference on Language, Information and Computation, pp. 120–129

    Google Scholar 

Download references

Acknowledgements

We are thankful to Nikhil Malhotra, Jugnu Manhas, and Saket Apte of Maker’s Lab, Tech Mahindra and Varsha Patil of AISSMS IOIT, Pune for support and help in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kirti Chiplunkar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chiplunkar, K., Kharche, M., Chaudhari, T., Shaligram, S., Limkar, S. (2021). Prediction of POS Tagging for Unknown Words for Specific Hindi and Marathi Language. In: Satapathy, S., Zhang, YD., Bhateja, V., Majhi, R. (eds) Intelligent Data Engineering and Analytics. Advances in Intelligent Systems and Computing, vol 1177. Springer, Singapore. https://doi.org/10.1007/978-981-15-5679-1_13

Download citation

Publish with us

Policies and ethics