Skip to main content

Annotating Sanskrit Corpus: Adapting IL-POSTS

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6562))

Abstract

In this paper we present an experiment on the use of the hierarchical Indic Languages POS Tagset (IL-POSTS) (Baskaran et al 2008 a&b), developed by Microsoft Research India (MSRI) for tagging Indian languages, for annotating Sanskrit corpus. Sanskrit is a language with richer morphology and relatively free word-order. The authors have included and excluded certain tags according to the requirements of the Sanskrit data. A revision to the annotation guidelines done for IL-POSTS is also presented. The authors also present an experiment of training the tagger at MSRI and documenting the results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • AU-KBC tagset. AU-KBC POS tagset for Tamil, http://nrcfosshelpline.in/smedia/images/downloads/Tamil_Tagset-opensource.odt

  • Baskaran, S., Bali, K., Bhattacharya, T., Bhattacharyya, P., Choudhury, M., Jha, G.N., Rajendran S., Saravanan K., Sobha L., Subbarao, K.V.S.: A Common Parts-of-Speech Tagset Framework for Indian Languages. In: LREC 2008 - 6th Language Resources and Evaluation Conference, Marrakech, Morocco, May 26-June1 (2008)

    Google Scholar 

  • Baskaran, S., Bali, K., Bhattacharya, T., Bhattacharyya, P., Choudhury, M., Jha, G.N., Rajendran, S., Saravanan, K., Sobha, L., Subbarao, K.V.S.: Designing a Common POS-Tagset Framework for Indian Languages. In: The 6thWorkshop on Asian Language Resources, Hyderabad (January 2008)

    Google Scholar 

  • Cardona, G.: Pāṇini: His work and its traditions, Motilal Banarasidass, Delhi (1988)

    Google Scholar 

  • Chandrashekar, R.: POS Tagger for Sanskrit, Ph.D. thesis, Jawaharlal Nehru University (2007)

    Google Scholar 

  • Cloeren, J.: Tagsets. In: van Halteren, H. (ed.) Syntactic Wordclass Tagging. Kluwer Academic, Dordrecht (1999)

    Google Scholar 

  • Jha, G.N.: Generating nominal inflectional morphology in Sanskrit. In: SIMPLE 2004, IIT-Kharagpur Lecture Compendium, Shyama Printing Works, Kharagpur, WB (2004)

    Google Scholar 

  • Jha, G.N., Sobha, L., Mishra, D., Singh, S.K., Pralayankar, P.: Anaphors in Sanskrit. In: Johansson, C. (eds.) Proceedings of the Second Workshop on Anaphora Resolution (2008), vol. 2, Cambridge Scholars Publishing (2007) ISSN 1736-6305

    Google Scholar 

  • Jha, G.N., Mishra, S.K.: Semantic processing in Panini’s karaka system. In: Huet, G., Kulkarni, A., Scharf, P. (eds.) Sanskrit Computational Linguistics. LNCS, vol. 5402, pp. 239–252. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  • Greene, B.B., Rubin, G.M.: Automatic grammatical tagging of English. Department of Linguistics, Brown University, Providence, R.I (1981)

    Google Scholar 

  • Hardie, A.: The Computational Analysis of Morphosyntactic Categories in Urdu. PhD Thesis submitted to Lancaster University (2004)

    Google Scholar 

  • IIIT-Tagset. A Parts-of-Speech tagset for Indian Languages, http://shiva.iiit.ac.in/SPSAL2007/iiit_tagset_guidelines.pdf

  • Gerard, H.: The Sanskrit Heritage Site, http://sanskrit.inria.fr/

  • Kale, M.R.: A Higher Sanskrit Grammar. MLBD Publishers, New Delhi (1995)

    Google Scholar 

  • Leech, G., Wilson, A.: Recommendations for the Morphosyntactic Annotation of Corpora. EAGLES Report EAG-TCWG-MAC/R (1996)

    Google Scholar 

  • Leech, G., Wilson, A.: Standards for Tag-sets. In: van Halteren, H. (ed.) Syntactic Wordclass Tagging. Kluwer Academic, Dordrecht (1999)

    Google Scholar 

  • Leech, G.: Grammatical Tagging. In: Garsire, R., Leech, G., McEnery, A. (eds.) Corpus Annotation: Linguistic Information for Computer Text Corpora. Longman, London (1997)

    Google Scholar 

  • Sudhir, M., Jha, G.N.: Identifying verb inflections in Sanskrit morphology. In: Proceedings of SIMPLE 2004, IIT Kharagpur (2005)

    Google Scholar 

  • NLPAI Contest-2006, http://ltrs.iiit.ac.in/nlpai_cntest06

  • Hellwig, O.: A Stochastic Lexical and POS Tagger for Sanskrit. In: Huet, G., Kulkarni, A., Scharf, P. (eds.) Sanskrit Computational Linguistics. LNCS, vol. 5402, pp. 266–277. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  • Ramkrishnamacharyulu, K.V.: Annotating Sanskrit Texts Based on Sabdabodha Systems. In: Kulkarni, A., Huet, G. (eds.) Sanskrit Computational Linguistics. LNCS (LNAI), vol. 5406, pp. 26–39. Springer, Heidelberg (2009)

    Google Scholar 

  • Santorini, B.: Part-of-speech tagging guidelines for the Penn Treebank Project. Technical report MS-CIS-90-47, Dept. of Computer and Information Science, University of Pennsylvania (1990)

    Google Scholar 

  • Subash, C.: Sanskrit Subanta Recognizer and Analyzer, M.Phil dissertation submitted to Jawaharlal Nehru University (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jha, G.N., Gopal, M., Mishra, D. (2011). Annotating Sanskrit Corpus: Adapting IL-POSTS. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20095-3_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20094-6

  • Online ISBN: 978-3-642-20095-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics