Skip to main content

Discourse Tagging for Indian Languages

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8403))

Abstract

Indian Language Discourse Project is to develop large corpus annotated with various types of discourse relations which are explicit and implicit. As an initial step towards it we have annotated corpus in three languages, Hindi, Tamil and Malayalam belonging to the two major language families in India- Indo Aryan and Dravidian. In this paper we describe our initial experiments in annotating all the three language corpus and the domains of the corpus belongs to health. The initial experiment brought out various types of discourse connectives in the three languages and how they vary amongst the languages. The preliminary study itself revealed that there is cross linguistic variation among the three languages. We have shown the inter annotator agreement for all the three languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Saif, A., Markert, K.: The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic. In: LREC (2010)

    Google Scholar 

  2. Zeyrek, D., Webber, B.: A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus. In: IJCNLP, Hyderabad, India (2008)

    Google Scholar 

  3. Kolachina, S., et al.: Evaluation of Discourse Relation Annotation in the Hindi Discourse Relation Bank. In: LREC (2012)

    Google Scholar 

  4. Korbayova, K.I., Webber, B.: Information structure and the formal presuppositions of discourse connectives. In: ESSLLI Workshop on Information Structure, Discourse Structure and Discourse Semantics, The University of Helsinki, Helsinki (2001)

    Google Scholar 

  5. Menaka, S., Rao, P.R.K., Devi, S.L.: Automatic identification of cause-effect relations in tamil using CRFs. In: Gelbukh, A.F. (ed.) CICLing 2011, Part I. LNCS, vol. 6608, pp. 316–327. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. Mladová, L., Zikánová, S., Hajičová, E.: From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank. In: LREC (2008)

    Google Scholar 

  7. Patterson, G., Kehler, A.: Predicting the Presence of Discourse Connectives: EMNLP 2013, Seattle, October 18-21 (2013)

    Google Scholar 

  8. Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn discourse Treebank 2.0. In: LREC (2008)

    Google Scholar 

  9. Prasad, R., Husain, S., Sharma, D.M., Joshi, A.: Towards an Annotated Corpus of Discourse Relations in Hindi. In: IJCNLP (2008)

    Google Scholar 

  10. Rachakonda, T.R., Sharma, D.M.: Creating an Annotated Tamil Corpus as a Discourse Resource. In: Linguistic Annotation Workshop (2011)

    Google Scholar 

  11. Roze, C., Danlos, L., Muller, P.: LEXCONN: A French lexicon of discourse connectives. In: MAD 2010 (Multidisciplinary Approaches to Discourse), Moissac, France, pp. 114–125 (2010)

    Google Scholar 

  12. Sobha, L., Patnaik, B.N.: Discourse Connectives and Their Arguments in Malayalam. In: 24th South Asian Language Analysis, November 19-21. University of Stony Brook, New York (2004)

    Google Scholar 

  13. Oza, U., et al.: The Hindi discourse relation bank. In: Third Linguistic Annotation Workshop, Association for Computational Linguistics (2009)

    Google Scholar 

  14. Webber, B., Knott, A., Joshi, A.: Multiple discourse connectives in a lexicalized grammar for discourse. In: Third International Workshop on Computational Semantics, Tilberg, Netherlands, pp. 309–325 (1999)

    Google Scholar 

  15. Versley, Y.: Towards finer-grained tagging of discourse connectives. In: Workshop beyond Semantics: Corpus-based Investigations of Pragmatic and Discourse Phenomena (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lalitha Devi, S., Lakshmi, S., Gopalan, S. (2014). Discourse Tagging for Indian Languages . In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54906-9_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54905-2

  • Online ISBN: 978-3-642-54906-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics