Abstract
Indian Language Discourse Project is to develop large corpus annotated with various types of discourse relations which are explicit and implicit. As an initial step towards it we have annotated corpus in three languages, Hindi, Tamil and Malayalam belonging to the two major language families in India- Indo Aryan and Dravidian. In this paper we describe our initial experiments in annotating all the three language corpus and the domains of the corpus belongs to health. The initial experiment brought out various types of discourse connectives in the three languages and how they vary amongst the languages. The preliminary study itself revealed that there is cross linguistic variation among the three languages. We have shown the inter annotator agreement for all the three languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Al-Saif, A., Markert, K.: The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic. In: LREC (2010)
Zeyrek, D., Webber, B.: A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus. In: IJCNLP, Hyderabad, India (2008)
Kolachina, S., et al.: Evaluation of Discourse Relation Annotation in the Hindi Discourse Relation Bank. In: LREC (2012)
Korbayova, K.I., Webber, B.: Information structure and the formal presuppositions of discourse connectives. In: ESSLLI Workshop on Information Structure, Discourse Structure and Discourse Semantics, The University of Helsinki, Helsinki (2001)
Menaka, S., Rao, P.R.K., Devi, S.L.: Automatic identification of cause-effect relations in tamil using CRFs. In: Gelbukh, A.F. (ed.) CICLing 2011, Part I. LNCS, vol. 6608, pp. 316–327. Springer, Heidelberg (2011)
Mladová, L., Zikánová, S., Hajičová, E.: From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank. In: LREC (2008)
Patterson, G., Kehler, A.: Predicting the Presence of Discourse Connectives: EMNLP 2013, Seattle, October 18-21 (2013)
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn discourse Treebank 2.0. In: LREC (2008)
Prasad, R., Husain, S., Sharma, D.M., Joshi, A.: Towards an Annotated Corpus of Discourse Relations in Hindi. In: IJCNLP (2008)
Rachakonda, T.R., Sharma, D.M.: Creating an Annotated Tamil Corpus as a Discourse Resource. In: Linguistic Annotation Workshop (2011)
Roze, C., Danlos, L., Muller, P.: LEXCONN: A French lexicon of discourse connectives. In: MAD 2010 (Multidisciplinary Approaches to Discourse), Moissac, France, pp. 114–125 (2010)
Sobha, L., Patnaik, B.N.: Discourse Connectives and Their Arguments in Malayalam. In: 24th South Asian Language Analysis, November 19-21. University of Stony Brook, New York (2004)
Oza, U., et al.: The Hindi discourse relation bank. In: Third Linguistic Annotation Workshop, Association for Computational Linguistics (2009)
Webber, B., Knott, A., Joshi, A.: Multiple discourse connectives in a lexicalized grammar for discourse. In: Third International Workshop on Computational Semantics, Tilberg, Netherlands, pp. 309–325 (1999)
Versley, Y.: Towards finer-grained tagging of discourse connectives. In: Workshop beyond Semantics: Corpus-based Investigations of Pragmatic and Discourse Phenomena (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lalitha Devi, S., Lakshmi, S., Gopalan, S. (2014). Discourse Tagging for Indian Languages . In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-54906-9_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54905-2
Online ISBN: 978-3-642-54906-9
eBook Packages: Computer ScienceComputer Science (R0)