Abstract
Temporal annotation of plain text is considered as a useful component of modern information retrieval tasks. In this work, two approaches for identification and classification of temporal entities in Hindi are developed and analyzed. Firstly, a rule based approach is developed, which takes plain text as input and based on a set of hand-crafted rules, produces a tagged output with identified temporal expressions. This approach is shown to have a strict F1-measure of 0.83. In the other approach, a CRF based classifier is trained with human tagged data and is then tested on a test dataset. The trained classifier identifies the temporal expressions from plain text and further classifies them to various classes. This approach is shown to have a strict F1-measure of 0.78. In this process a reusable gold standard dataset for temporal tagging in Hindi was developed. Named the ILTIMEX2012 corpus, it consists of 300 manually tagged Hindi news documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pustejovsky, J.: TERQAS: Time and Event recognition for question answering systems. In: ARDA Workshop (2002)
Pustejovsky, J., Castano, J., Ingria, R., Sauri, R., Gaizauskas, R., Setzer, A., Katz, G., Radev, D.: TimeML: Robust specification of event and temporal expressions in text. In: New Directions in Question Answering 2003, pp. 28–34 (2003)
Mani, I., Schiffman, B.: Temporally anchoring and ordering events in news. In: Time and Event Recognition in Natural Language. John Benjamins (2005)
Swan, R., Allan, J.: Automatic generation of overview timelines. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–56. ACM (2000)
Allan, J., Gupta, R., Khandelwal, V.: Temporal summaries of new topics. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 10–18. ACM (2001)
Majumder, P.: Forum for Information Retrieval Evaluation 2011 (2011), http://www.isical.ac.in/~fire/2011/slides/fire.2011.majumder.prasenjit.pdf
Mani, I., Wilson, G., Sundheim, B., Ferro, L.: A multilingual approach to annotating and extracting temporal information. In: Proceedings of the Workshop on Temporal and Spatial Information Processing, vol. 1, p. 12. Association for Computational Linguistics (2001)
Negri, M., Marseglia, L.: Recognition and Normalization of Time Expressions: ITC-irst at TERN 2004. Rapport Interne, ITC-irst, Trento (2004)
Saquete, E., Muñoz, R., MartÃnez-Barco, P.: Event ordering using TERSEO system. Data & Knowledge Engineering 58(1), 70–89 (2006)
Strötgen, J., Gertz, M.: HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 321–324 (2010)
Hacioglu, K., Chen, Y., Douglas, B.: Automatic time expression labeling for english and chinese text. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 548–559. Springer, Heidelberg (2005)
MUC-7: Message Understanding Conference 1998. In: Proceedings of the Seventh Message Understanding Conference, DARPA (1998)
Shokouhi, M.: SIGIR 2012 Workshop on Time Aware Information Access (2012), http://research.microsoft.com/en-us/people/milads/taia2012.aspx
Mazur, P.: TIMEX Portal (2008), http://www.timexportal.info/
MITRE-Corporation: TIDES Temporal Annotation Guide. The MITRE Corporation (June 2001)
MITRE-Corporation: 2005 Standard for the Annotation of Temporal Expressions. The MITRE Corporation (April 2005)
Mazur, P., Dale, R.: An intermediate representation for the interpretation of temporal expressions. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 33–36. Association for Computational Linguistics (2006)
Ahn, D., Rantwijk, J., Rijke, M.: A Cascaded Machine Learning Approach to Interpreting Temporal Expressions. In: HLT-NAACL, pp. 420–427 (2007)
NIST: The ACE 2004 Evaluation Plan, http://www.itl.nist.gov/iad/mig/tests/ace/2004/doc/ace04-evalplan-v7.pdf (2004)
NIST: Automatic Content Extraction (2004), http://www.itl.nist.gov/iad/mig/tests/ace/2004/index.html
Cunningham, H.: GATE, a general architecture for text engineering. Computers and the Humanities 36(2), 223–254 (2002)
Bharati, A., Chaitanya, V., Sangal, R., Ramakrishnamacharyulu, K.: Natural language processing: a Paninian perspective. Prentice-Hall of India, New Delhi (1995)
Jha, G.N.: The TDIL Program and the Indian Langauge Corpora Intitiative(ILCI). In: LREC (2010)
Kodu, T.: CRF++: Yet another CRF toolkit (2005), http://crfpp.googlecode.com/svn/trunk/doc/index.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Ramrakhiyani, N., Majumder, P. (2013). Temporal Expression Recognition in Hindi. In: Prasath, R., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8284. Springer, Cham. https://doi.org/10.1007/978-3-319-03844-5_72
Download citation
DOI: https://doi.org/10.1007/978-3-319-03844-5_72
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03843-8
Online ISBN: 978-3-319-03844-5
eBook Packages: Computer ScienceComputer Science (R0)