Skip to main content

Evaluating Tagsets for Sanskrit

  • Conference paper
Sanskrit Computational Linguistics (ISCLS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6465))

Included in the following conference series:

Abstract

In this paper we present an evaluation of available Part Of Speech (POS) tagsets designed for tagging Sanskrit and Indian languages which are developed in India. The tagsets evaluated are - JNU-Sanskrit tagset (JPOS), Sanskrit consortium tagset (CPOS), MSRI-Sanskrit tagset (IL-POST), IIIT Hyderabad tagset (ILMT POS) and CIIL Mysore tagset for the Linguistic Data Consortium for Indian Languages (LDCIL) project (LDCPOS). The main goal behind this enterprise is to check the suitability of existing tagsets for Sanskrit from various Natural Language Processing (NLP) points of view.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baskaran, S., Bali, K., Bhattacharya, T., Bhattacharyya, P., Choudhury, M., Jha, G.N., Rajendran, S., Saravanan, K., Sobha, L., Subbarao, K.V.S.: A Common Parts-of-Speech Tagset Framework for Indian Languages. In: LREC, Marrakech, Morocco (2008)

    Google Scholar 

  2. Baskaran, S., et al.: Framework for a Common Parts-of-Speech Tagset for IndicLanguages (2007), http://research.microsoft.com/~baskaran/POSTagset

  3. Cardona, G.: Pāṇini:  His work and its traditions. Motilal Banarasidass, Delhi (1988)

    Google Scholar 

  4. Chandrashekar, R.: Parts-of-Speech Tagging For Sanskrit. Ph.D. thesis submitted to JNU, New Delhi (2007)

    Google Scholar 

  5. Greene, B.B., Rubin, G.M.: Automatic grammatical tagging of English. Department of Linguistics, Brown University, Providence, R.I. (1981)

    Google Scholar 

  6. Hardie, A.: The Computational Analysis of Morphosyntactic Categories in Urdu. PhD Thesis submitted to Lancaster University (2004)

    Google Scholar 

  7. Hellwig, O.: SANSKRITTAGGER, A Stochastic Lexical and POS Tagger for Sanskrit. In: Huet, G., Kulkarni, A. (eds.) Sanskrit Computational Linguistics 2007. LNCS (LNAI), vol. 5402. Springer, Heidelberg (2009)

    Google Scholar 

  8. Huet, G.: The Sanskrit Heritage Site, http://sanskrit.inria.fr/

  9. IIIT-Tagset. A Parts-of-Speech tagset for Indian Languages, http://shiva.iiit.ac.in/SPSAL2007/iiit_tagset_guidelines.pdf

  10. Jha, G.N.: Generating nominal inflectional morphology in Sanskrit. In: SIMPLE 2004, IIT-Kharagpur Lecture Compendium, Shyama Printing Works, Kharagpur (2004)

    Google Scholar 

  11. Jha, G.N., Gopal, M., Mishra, D.: Annotating Sanskrit Corpus: adapting IL-POSTS. In: Vetulani, Z. (ed.) Proceedings of the 4th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 467–471 (2009)

    Google Scholar 

  12. Jha, G.N., Mishra, S.: Semantic processing in Panini’s karaka system. In: Huet, G., Kulkarni, A., Scharf, P. (eds.) Sanskrit Computational Linguistics 2007/2008. LNCS (LNAI), vol. 5402. Springer, Heidelberg (2009)

    Google Scholar 

  13. Kale, M.R.: A Higher Sanskrit Grammar. MLBD Publishers, New Delhi (1995)

    Google Scholar 

  14. Leech, G., Wilson, A.: Recommendations for the Morphosyntactic Annotation of Corpora. EAGLES Report EAG-TCWG-MAC/R (1996)

    Google Scholar 

  15. Leech, G., Wilson, A.: Standards for Tag-sets. In: van Halteren, H. (ed.) Syntactic Word class Tagging. Kluwer Academic, Dordrecht (1999)

    Google Scholar 

  16. Leech, G.: Grammatical Tagging. In: Garsire, Leech, McEnery (eds.) Corpus Annotation: Linguistic Information for Computer Text Corpora. Longman, London (1997)

    Google Scholar 

  17. Mishra, S., Jha, G.N.: Identifying verb inflections in Sanskrit morphology. In: Proceedings of SIMPLE 2004, IIT Kharagpur (2005)

    Google Scholar 

  18. Ramkrishnamacharyulu, K.V.: Annotating Sanskrit Texts Based on Sabdabodha Systems. In: Kulkarni, A., Huet, G. (eds.) Sanskrit Computational Linguistics. LNCS (LNAI), vol. 5406, pp. 26–39. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  19. Rishi, U.S.S. (ed.): Yaska-pranitam niruktam, vol. I. Chowkhamba Vidyabhawan, Varanasi (2005)

    Google Scholar 

  20. Santorini, B.: Part-of-speech tagging guidelines for the Penn Treebank Project. Technical report MS-CIS-90-47, Dept. Of Computer and Information Science, University of Pennsylvania (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gopal, M., Mishra, D., Singh, D.P. (2010). Evaluating Tagsets for Sanskrit. In: Jha, G.N. (eds) Sanskrit Computational Linguistics. ISCLS 2010. Lecture Notes in Computer Science(), vol 6465. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17528-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17528-2_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17527-5

  • Online ISBN: 978-3-642-17528-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics