Skip to main content

A Study on the Importance of Linguistic Suffixes in Maithili POS Tagger Development

  • Conference paper
  • First Online:
Mining Intelligence and Knowledge Exploration (MIKE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11987))

Abstract

This paper presents our study on the effect of morphological inflections in the performance of a Maithili Part of Speech (POS) tagger. In the last few years, substantial effort is devoted to developing morphological analyzers and POS taggers in several Indian languages including Hindi, Bengali, Tamil, Telugu, Kannada, Punjabi and Marathi. But we did not find any open POS tagger or morphological analyzers in Maithili. However, Maithili is one of the official languages of India with around 50 million native speakers. So, we worked on developing a POS tagger in Maithili. For the development, we used a manually annotated in-house Maithili corpus containing 52,190 tokens. The tagset contains 27 tags. We first trained conditional random fields (CRF) classifier with various combination of word unigram, bigram, fixed-length suffix, and prefix features. There we observed that the fixed-length suffixes do not show the expected accuracy improvement. However, during the manual corpus annotation, we observed that suffixes played as a helpful clue. So, instead of using the fixed-length suffixes, we worked on identifying the morphological inflections in Mathili. When we used these morphological suffixes in the system, we found a noticeable performance improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Maithili_language.

  2. 2.

    http://www.esamaad.com/.

  3. 3.

    http://www.maithilijindabaad.com/.

  4. 4.

    http://www.mithiladainik.in/.

  5. 5.

    http://sahitya-akademi.gov.in/sahitya-akademi/index.jsp.

  6. 6.

    http://www.videha.co.in/.

  7. 7.

    https://www.aczoom.com/itrans/online/.

  8. 8.

    https://taku910.github.io/crfpp/.

References

  1. Arulmozhi, P., Sobha, L.: A hybrid POS tagger for a relatively free word order language. In: Proceedings of the First National Symposium on Modeling and Shallow Parsing of Indian Languages, pp. 79–85 (2006)

    Google Scholar 

  2. Bharati, A., Chaitanya, V., Sangal, R., Ramakrishnamacharyulu, K.: Natural Language Processing: A Paninian Perspective. Prentice-Hall of India, New Delhi (1995)

    Google Scholar 

  3. Dandapat, S.: Part-of-speech tagging for Bengali. Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur (2009)

    Google Scholar 

  4. Dandapat, S., Sarkar, S., Basu, A.: Automatic part-of-speech tagging for Bengali: an approach for morphologically rich languages in a poor resource scenario. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 221–224. Association for Computational Linguistics (2007)

    Google Scholar 

  5. Ekbal, A., Haque, R., Bandyopadhyay, S.: Bengali part of speech tagging using conditional random field. In: Proceedings of Seventh International Symposium on Natural Language Processing (SNLP 2007), pp. 131–136 (2007)

    Google Scholar 

  6. Garg, N., Goyal, V., Preet, S.: Rule based Hindi part of speech tagger. In: Proceedings of COLING 2012: Demonstration Papers, pp. 163–174 (2012)

    Google Scholar 

  7. Greene, B.B., Rubin, G.M.: Automatic grammatical tagging of English. Department of Linguistics, Brown University (1971)

    Google Scholar 

  8. Harris, Z.S.: String analysis of sentence structure, no. 1, Mouton (1962)

    Google Scholar 

  9. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)

    Google Scholar 

  10. Modi, D., Nain, N.: Part-of-speech tagging of Hindi corpus using rule-based method. In: Afzalpulkar, N., Srivastava, V., Singh, G., Bhatnagar, D. (eds.) Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing, pp. 241–247. Springer, New Delhi (2016). https://doi.org/10.1007/978-81-322-2638-3_28

    Chapter  Google Scholar 

  11. Priyadarshi, A., Saha, S.K.: Towards the first Maithili part of speech tagger: resource creation and system development. Comput. Speech Lang. 62, 101054 (2019)

    Article  Google Scholar 

  12. Ranjan, P., Basu, H.V.S.S.A.: Part of speech tagging and local word grouping techniques for natural language parsing in Hindi. In: Proceedings of the 1st International Conference on Natural Language Processing (ICON 2003). Citeseer (2003)

    Google Scholar 

  13. Sharma, S.K., Lehal, G.S.: Using hidden Markov model to improve the accuracy of Punjabi POS tagger. In: 2011 IEEE International Conference on Computer Science and Automation Engineering, vol. 2, pp. 697–701. IEEE (2011)

    Google Scholar 

  14. Shrivastava, M., Bhattacharyya, P.: Hindi POS tagger using Naive stemming: harnessing morphological information without extensive linguistic knowledge. In: International Conference on NLP (ICON 2008), Pune, India (2008)

    Google Scholar 

  15. Singh, S., Gupta, K., Shrivastava, M., Bhattacharyya, P.: Morphological richness offsets resource demand-experiences in constructing a POS tagger for Hindi. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 779–786. Association for Computational Linguistics (2006)

    Google Scholar 

Download references

Funding

This work was supported by Science and Engineering Research Board, India [Grant No: EEQ/2016/000241].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sujan Kumar Saha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Priyadarshi, A., Saha, S.K. (2020). A Study on the Importance of Linguistic Suffixes in Maithili POS Tagger Development. In: B. R., P., Thenkanidiyoor, V., Prasath, R., Vanga, O. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2019. Lecture Notes in Computer Science(), vol 11987. Springer, Cham. https://doi.org/10.1007/978-3-030-66187-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66187-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66186-1

  • Online ISBN: 978-3-030-66187-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics