Skip to main content

Developing LRs for Non-scheduled Indian Languages

A Case of Magahi

  • Conference paper
  • First Online:
Human Language Technology Challenges for Computer Science and Linguistics (LTC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Included in the following conference series:

  • 845 Accesses

Abstract

Magahi is an Indo-Aryan Language, spoken mainly in the Eastern parts of India. Despite having a significant number of speakers, there has been virtually no language resource (LR) or language technology (LT) developed for the language, mainly because of its status as a non-scheduled language. The present paper describes an attempt to develop an annotated corpus of Magahi. The data is mainly taken from a couple of blogs in Magahi, some collection of stories in Magahi and the recordings of conversation in Magahi and it is annotated at the POS level using BIS tagset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://dit.uvt.nl/

References

  • Alok, D.: Magahi noun-particles: a semantic and pragmatic study. Paper presented in Fourth Students’ Conference of Linguistics in India (SCONLI 4), Mumbai, India, February 2010

    Google Scholar 

  • Alok, D.: A language without articles: the case of Magahi. Unpublished M.Phil. dissertation, Jawaharlal Nehru University, New Delhi (2012)

    Google Scholar 

  • Chatterji, S.K.: The Origin and Development of the Bengali Language, 3 vols. George Allen and Unwin, London (1926) (Reprint 1970)

    Google Scholar 

  • Grierson, G.A.: Linguistic Survey of India, Vol. V: Indo-Aryan Family, Eastern Group, Pt. II: Specimens of the Bihari and Oriya Languages. Motilal Banarsidass, Delhi (1903) (Reprint 1967)

    Google Scholar 

  • Hellan, L., Mary E.K.D.: A methodology for enhancing argument structure specification. In: Proceedings of the 4th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, LTC 2009, Poznan, Poland, 6–8 November 2009

    Google Scholar 

  • Jeffers, R.J.: The position of the Bihari dialects in Indo-Aryan. Indo-Iran. J. 18, 215–225 (1976)

    Article  Google Scholar 

  • Jha, G.N.: The TDIL program and the Indian Language Corpora Initiative (ILCI). In: Proceedings of the Seventh International conference on Language Resources and Evaluation, LREC’10, pp. 982–985 (2010)

    Google Scholar 

  • Kumar, R., Lahiri, B., Alok, D.: Developing a POS tagger for Magahi: a comparative study. In: Proceedings of the 10th Workshop on Asian Language Resources, 24th International Conference on Computational Linguistics (COLING-24), IIT-Bombay, Mumbai, India (2012)

    Google Scholar 

  • Masica, C.P.: The Indo-Aryan Languages. Cambridge University Press, Cambridge (1991)

    Google Scholar 

  • Verma, M.K.: Exploring the parameters of agreement: the case of Magahi. Lang. Sci. 13(2), 125–143 (1991)

    Article  Google Scholar 

  • Verma, S.: Magahi. In: Cardona, G., Jain, D. (eds.) The Indo-Aryan Languages, pp. 498–514. Routledge, London (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ritesh Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kumar, R., Lahiri, B., Alok, D. (2014). Developing LRs for Non-scheduled Indian Languages. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08958-4_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08957-7

  • Online ISBN: 978-3-319-08958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics