Abstract
We present our work on automatic identification of cause-effect relations in a given Tamil text. Based on the analysis of causal constructions in Tamil, we identified a set of causal markers for Tamil and arrived at certain features used to develop our language model. We manually annotated a Tamil corpus of 8648 sentences for cause-effect relations. With this corpus, we developed the model for identifying causal relations using the machine learning technique, Conditional Random Fields (CRFs). We performed experiments and the results are encouraging. We performed an error analysis of the results and found that the errors can be attributed to some very interesting structural interdependencies between closely occurring causal relations. After comparing these structures in Tamil and English, we claim that at discourse level, the complexity of structural interdependencies between causal relations is more complex in Tamil than in English due to the free word order nature of Tamil.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arulmozhi, P., Devi, S.L.: HMM based POS Tagger for a Relatively Free Word Order Language. Journal of Research on Computing Science 18, 37–48 (2006)
Elwell, R., Baldridge, J.: Discourse Connective Argument Identification with Connective Specific Rankers. In: IEEE International Conference on Semantic Computing, August 4-7, pp. 198–205 (2008)
Girju, R.: Automatic Detection of Causal Relations for Question Answering. In: The Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Workshop on Multilingual Summarization and Question Answering - Machine Learning and Beyond (2003)
Khoo, C., Kornfilt, J., Oddy, R., Myaeng, S.H.: Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Literary & Linguistic Computing 13(4), 177–186 (1998)
Kudo, T.: CRF++, an open source toolkit for CRF (2005), http://crfpp.sourceforge.net
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the 18th International Conference on Machine Learning (ICML 2001), pp. 282–289 (2001)
Lee, A., Prasad, R., Joshi, A., Dinesh, N., Webber, B.: Complexity of Dependencies in Discourse: Are Dependencies in Discourse More Complex Than in Syntax? In: Proceedings of the 5th International Workshop on Treebanks and Linguistic Theories, Prague, Czech Republic (December 2006)
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3), 243–281 (1988)
Marcu, D., Echihabi, A.: An Unsupervised Approach to Recognizing Discourse Relations. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, July 7-12 (2002)
Oza, U., Prasad, R., Kolachina, S., Sharma, D.M., Joshi, A.: The Hindi Discourse Relation Bank. In: Proceedings of the Third Linguistic Annotation Workshop, Annual Meeting of the ACL, Suntec, Singapore, pp. 158–161. Association for Computational Linguistics, Morristown (2009)
Pechsiri, C., Sroison, P., Janviriyasopak, U.: Know-why extraction from textual data for supporting what question. In: Coling 2008: Proceedings of the Workshop on Knowledge and Reasoning For Answering Questions, Manchester, UK, ACL Workshops, pp. 17–24. Association for Computational Linguistics, Morristown (2008)
The PDTB Research Group: The Penn Discourse TreeBank 1.0. Annotation Manual, IRCS Technical Report IRCS-06-01, Institute for Research in Cognitive Science, University of Pennsylvania (March 2006)
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn Discourse TreeBank 2.0. In: Proc. of LREC 2008 (2008)
Sobha, L., Vijay Sundar Ram, R.: Noun Phrase Chunker for Tamil. In: Proceedings of the First National Symposium on Modeling and Shallow Parsing of Indian Languages (MSPIL), IIT Mumbai, India, pp. 194–198 (2006)
Devi, S.L., Menaka, S.: Semantic Representation of Causality, National Seminar on Lexical Resources and Applied Computational Techniques on Indian Languages, Pondicherry University, October 4-5 (2010)
Vijay Sundar Ram, R., Devi, S.L.: Clause Boundary Identification Using Conditional Random Fields. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 140–150. Springer, Heidelberg (2008)
Viswanathan, S., Ramesh Kumar, S., Kumara Shanmugam, B., Arulmozi, S., Vijay Shanker, K.: A Tamil Morphological Analyser. In: Proceedings of the International Conference on Natural Language Processing (ICON), CIIL, Mysore, India (2003)
Wellner, B., Pustejovsky, J.: Automatically Identifiying the Arguments of Discourse Connectives. In: Proceedings of EMNLP-CoNLL (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
S., M., Rao, P.R.K., Lalitha Devi, S. (2011). Automatic Identification of Cause-Effect Relations in Tamil Using CRFs. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-19400-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19399-6
Online ISBN: 978-3-642-19400-9
eBook Packages: Computer ScienceComputer Science (R0)