Skip to main content

Pattern Based Bootstrapping Technique for Tamil POS Tagging

  • Conference paper
Mining Intelligence and Knowledge Exploration

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8891))

Abstract

Part of speech (POS) tagging is one of the basic preprocessing techniques for any text processing NLP application. It is a difficult task for morphologically rich and partially free word order languages. This paper describes a Part of Speech (POS) tagger of one such morphologically rich language, Tamil. The main issue of POS tagging is the ambiguity that arises because different POS tags can have the same inflections, and have to be disambiguated using the context. This paper presents a pattern based bootstrapping approach using only a small set of POS labeled suffix context patterns. The pattern consists of a stem and a sequence of suffixes, obtained by segmentation using a suffix list. This bootstrapping technique generates new patterns by iteratively masking suffixes with low probability of occurrences in the suffix context, and replacing them with other co-occurring suffixes. We have tested our system with a corpus containing 20,000 Tamil documents having 2,71,933 unique words. Our system achieves a precision of 87.74%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Garg, N., Goyal, V., Preet, S.: Rules Based Part of Speech Tagger. In: The Proceedings of COLING, pp. 163–174 (2012)

    Google Scholar 

  2. Bagul, P., Mishra, A., Mahajan, P., Kulkarni, M., Dhopavkar, G.: Rule Based POS Tagger for Marathi Text. The Proceedings of International Journal of Computer Science and Information Technologies (IJCSIT) 5(2), 1322–1326 (2014)

    Google Scholar 

  3. Joshi, N., Darbari, H., Mathur, I.: Hmm Based Pos Tagger For Hindi. In: The Proceedings of the Computer Science Conference Proceedings, CSCP (2013)

    Google Scholar 

  4. Manju, K., Soumya, S., Idicula, S.M.: Development of a Pos Tagger for Malayalam-An Experience. In: Proceedings of the International Conference on Advances in Recent Technologies in Communication and Computing (2009)

    Google Scholar 

  5. Saharia, N., Das, D., Sharma, U., Kalita, J.: Part of Speech Tagger for Assamese Text. In: The Proceedings of ACL-IJCNLP Conference Short Papers, pp. 33–36 (2009)

    Google Scholar 

  6. Singh, J., Joshi, N., Mathur, I.: Part of Speech Tagging of Marathi Text Using Trigram method. Proceedings of the International Journal of Advanced Information Technology (IJAIT) 3(2) (April 2013)

    Google Scholar 

  7. Singh, T.D.: Manipuri POS Tagging using CRF and SVM: A Language Independent Approach. In: Proceedings of the International Conference on Natural Language Processing, ICON (2008)

    Google Scholar 

  8. Pallavi, A.S.P.: Parts Of Speech (POS) Tagger for Kannada Using Conditional Random Fields (CRFs). In: Proceedings of the National Conference on Indian Language Computing, NCILC (2014)

    Google Scholar 

  9. Patel, C., Gali, K.: Part-Of-Speech Tagging for Gujarati Using Conditional Random Fields. In: Proceedings of the IJCNLP Workshop on NLP for Less Privileged Languages, pp. 117–122 (2008)

    Google Scholar 

  10. Antony, P.J., Mohan, S.P., Soman K.P.: SVM Based Part of Speech Tagger for Malayalam. In: Proceedings of the International Conference on Recent Trends in Information (2010)

    Google Scholar 

  11. Sindhiya Binulal, G., Anand Goud, P., Soman, K.P.: A SVM based approach to Telugu Parts of Speech Tagging using SVMTool. Proceedings of the International Journal of Recent Trends in Engineering 1(2) (2009)

    Google Scholar 

  12. Chandrakanth, D., Anand Kumar, M., Gunasekaran, S.: Part-Of-Speech Tagging For Tamil Language. Proceedings of the International Journal of Communications and Engineering 06(6(1)) (March 2012)

    Google Scholar 

  13. Lakshmana Pandian, S., Geetha, T.V.: Morpheme based Language Model for Tamil Part-of-Speech Tagging. Proceedings of the Research Journal on Computer Science and Computer Engineering with Applications, 19–25 (July-December 2008)

    Google Scholar 

  14. Akilan, R., Naganathan, E.R.: Pos Tagging for Classical Tamil Texts. Proceedings of the International Journal of Business Intelligent 1(01) (January-June 2012)

    Google Scholar 

  15. Palanisamy, A., Devi, S.L.: HMM based POS Tagger for a Relatively Free Word Order Language. Proceedings of the Research in Computing Science (18), 37–48 (2006)

    Google Scholar 

  16. Arulmozhi, P., Pattabhi R K Rao, T., Sobha, L.: A Hybrid POS Tagger for a Relative Free Word Order Language. In: Proceedings of the MSPIL 2006 (2006)

    Google Scholar 

  17. Dhanalakshmi, V., Anand Kumar, M., Rajendran, S., Soman, K.P.: POS Tagger and Chunker for Tamil Language. In: Proceedings of Tamil Internet Conference (2009)

    Google Scholar 

  18. Murthy, K.N., Badugu, S.: A New Approach to Tagging in Indian Languages. Proceedings of the Research in Computing Science (70), 45–56 (2013)

    Google Scholar 

  19. Lakshmana Pandian, S.: Language models developed for POS tagging and chunking. In: Proceedings of 22nd International Conference, ICCPOL 2009 (2009)

    Google Scholar 

  20. Anand Kumar, M., Dhanalakshmi, V., Soman, K.P., Rajendran, S.: A Sequence Labeling Approach to Morphological Analyzer for Tamil Language. Proceedings of International Journal on Computer Science and Engineering International Journal on Computer Science and Engineering (IJCSE) 02(06), 1944–1951 (2010)

    Google Scholar 

  21. Cucerzan, Yarowsky, D.: Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day. In: Proceedings of the Sixth Conference on Natural Language Learning (CoNLL), pp. 132–138 (2002)

    Google Scholar 

  22. Clark, S., Curran, J.R., Osborne, M.: Bootstrapping POS taggers using Unlabelled Data. In: Proceedings of the Seventh CoNLL Conference (2003)

    Google Scholar 

  23. Wang, W., Huang, Z., Harper, M.: Semi-Supervised Learning for Part-of-Speech Tagging of Mandarin Transcribed Speech. In: Proceedings of the ICASSP, vol. 4 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ganesh, J., Parthasarathi, R., Geetha, T.V., Balaji, J. (2014). Pattern Based Bootstrapping Technique for Tamil POS Tagging. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8891. Springer, Cham. https://doi.org/10.1007/978-3-319-13817-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13817-6_25

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13816-9

  • Online ISBN: 978-3-319-13817-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics