Annotation projection for temporal information extraction

Chris R. Giannella; Ransom K. Winder; Joseph P. Jubinski

doi:10.1017/S1351324919000044

Annotation projection for temporal information extraction

Published online by Cambridge University Press: 15 May 2019

Chris R. Giannella ,

Ransom K. Winder and

Joseph P. Jubinski

Show author details

Chris R. Giannella*: Affiliation:
The MITRE Corporation, 7515 Colshire Dr., McLean, VA 22102, USA
Ransom K. Winder: Affiliation:
The MITRE Corporation, 7515 Colshire Dr., McLean, VA 22102, USA
Joseph P. Jubinski: Affiliation:
The MITRE Corporation, 7515 Colshire Dr., McLean, VA 22102, USA
*: *Corresponding author. Email: cgiannella@mitre.org

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Approaches to building temporal information extraction systems typically rely on large, manually annotated corpora. Thus, porting these systems to new languages requires acquiring large corpora of manually annotated documents in the new languages. Acquiring such corpora is difficult owing to the complexity of temporal information extraction annotation. One strategy for addressing this difficulty is to reduce or eliminate the need for manually annotated corpora through annotation projection. This technique utilizes a temporal information extraction system for a source language (typically English) to automatically annotate the source language side of a parallel corpus. It then uses automatically generated word alignments to project the annotations, thereby creating noisily annotated target language training data. We developed an annotation projection technique for producing target language temporal information extraction systems. We carried out an English (source) to French (target) case study wherein we compared a French temporal information extraction system built using annotation projection with one built using a manually annotated French corpus. While annotation projection has been applied to building other kinds of Natural Language Processing tools (e.g., Named Entity Recognizers), to our knowledge, this is the first paper examining annotation projection as applied to temporal information extraction where no manual corrections of the target language annotations were made. We found that, even using manually annotated data to build a temporal information extraction system, F-scores were relatively low (<0.35), which suggests that the problem is challenging even with manually annotated data. Our annotation projection approach performed well (relative to the system built from manually annotated data) on some aspects of temporal information extraction (e.g., event–document creation time temporal relation prediction), but it performed poorly on the other kinds of temporal relation prediction (e.g., event–event and event–time).

Keywords

Information extraction Machine learning Temporal processing Translation technology

Type: Article
Information: Natural Language Engineering , Volume 25 , Issue 3 , May 2019 , pp. 385 - 403

DOI: https://doi.org/10.1017/S1351324919000044 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bethard, S. (2013). ClearTK-TimeML: A minimalist approach to TempEval 2013. In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval-13) as part of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 10–14.Google Scholar

Bittar, A., Amsili, P., Denis, P. and Danios, L. (2011). French Timebank: An ISO-TimeML annotated reference corpus. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2. Association for Computational Linguistics, pp. 130–134.Google Scholar

Bojar, O., Buck, C., Callison-Burch, C., Federmann, C., Haddow, B., Koehn, P., Monz, C., Post, M., Soricut, R. and Specia, L. (2013). Findings of the 2013 workshop on statistical machine translation. In Proceedings of the 8th Workshop on Statistical Machine Translation. Association for Computational Linguistics, pp. 1–44.Google Scholar

Caselli, T., Fokkens, A., Morante, R. and Vossen, P. (2015). SPINOZA_VU: An NLP pipeline for cross document TimeLines. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval-15) as part of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 787–791.Google Scholar

Chambers, N., Wang, S. and Jurafsky, D. (2007). Classifying temporal relations between events. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL) – Interactive Poster and Demonstration Session. Association for Computational Linguistics, pp. 173–176.Google Scholar

Chambers, N., Cassidy, T., McDowell, B. and Bethard, S. (2014). Dense event ordering with a multi-pass architecture. Transactions of the Association for Computational Linguistics 2, 273–284.CrossRef Google Scholar

Costa, F. and Branco, A. (2010). Temporal information processing of a new language: Fast porting with minimal resources. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 671–677.Google Scholar

D’souza, J. and Ng, V. (2013). Classifying temporal relations with rich linguistic knowledge. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Association for Computational Linguistics, pp. 918–927.Google Scholar

Das, D. and Petrov, S. (2011). Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of the 49th Annual Meeting of the Association of Computational Linguistics (ACL). Association for Computational Linguistics, pp. 600–609.Google Scholar

Do, Q., Lu, W. and Roth, D. (2012). Joint inference for event timeline construction. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing (EMNLP) and Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, pp. 677–687.Google Scholar

Forascu, C. and Tufis, D. (2012). Romanian TimeBank: An annotated parallel corpus for temporal information. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC). European Language Resource Association, pp. 3762–3766.Google Scholar

Fairholm, W.O. (2014). Annotation of Temporal Relations Using Markov Logic Networks and Temporal Centering. Master’s Thesis, Guelph, Ontario, Canada: School of Computer Science, University of Guelph.Google Scholar

Ganchev, K. and Das, D. (2013). Cross-lingual discriminative learning of sequence models with posterior regularization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 1996–2006.Google Scholar

Ganchev, K., Gillenwater, J. and Taskar, B. (2009). Dependency grammar induction via Bitext projection constraints. In Proceedings of the 47th Annual Meeting of the Association of Computational Linguistics (ACL). Association for Computational Linguistics, pp. 369–377.Google Scholar

Genkin, A., Lewis, D. and Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics (American Statistical Association and the American Society for Quality) 49(3), 291–304. doi: 10.1198/004017007000000245.Google Scholar

Glavas, G. and Snajder, J. (2015). Construction and evaluation of event graphs. Natural Language Engineering 21(4), 607–652. doi: 10.1017/S1351324914000060.CrossRef Google Scholar

Gouws, S. and Sogaard, A. (2015). Simple task-specific bilingual word embedding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Association for Computational Linguistics (ACL), pp. 1386–1390.Google Scholar

He, L., Gillenwater, J. and Taskar, B. (2013). Graph-based posterior regularization for semi-supervised structured prediction. In Proceedings of the 17th Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, pp. 38–46.Google Scholar

Jang, S.B., Baldwin, J. and Mani, I. (2004). Automatic TIMEX2 tagging of Korean news. ACM Transactions on Asian Languages Information Processing 3(1), 51–65. doi: 10.1145/1017068.1017072.CrossRef Google Scholar

Jarzebowski, P. and Przepiorkowski, A. (2012). Temporal information extraction with cross-language projected data. In Isahara, H. and Kanzaki, K. (eds), Advances in Natural Language Processing, Lecture Notes in Computer Science, vol. 7614, Springer, Berlin, Heidelberg, pp. 198–209.CrossRef Google Scholar

Jeong, Y.-S., Kim, Z.M., Do, H.-W., Lim, C.-G. and Choi, H.-J. (2015). Temporal information extraction from Korean texts. In Proceedings of the 19th Conference on Computational Language Learning (CoNLL). Association for Computational Linguistics, pp. 279–288.Google Scholar

Kozhevnikov, M. and Titov, I. (2014). Cross-lingual model transfer using feature representation projection. In Proceedings of the 52nd Annual Meetings of the Association for Computational Linguistics. Association for Computational Linguistics (ACL), pp. 579–585.Google Scholar

Laokulrat, N., Miwa, M. and Tsuruoka, Y. (2015). Stacking approach to temporal relation classification with temporal inference. Journal of Natural Language Processing 22(3), 171–196. doi: 10.5715/jnlp.22.171.CrossRef Google Scholar

Laparra, E., Aldabe, I. and Rigau, G. (2015). Document level time-anchoring for TimeLine extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL) and the 7th International Joint Conference on Natural Language Processing (IJCNLP). Association for Computational Linguistics, pp. 358–364.Google Scholar

Liang, P., Taskar, B. and Klein, D. (2006). Alignment by agreement. In Proceeding of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL). Association of Computational Linguistics, pp. 104–111.Google Scholar

Ling, X. and Weld, D. (2010). Temporal information extraction. In Proceedings of the 24th AAAI Conference on Artificial Intelligence. The AAAI Press, pp. 1385–1390.Google Scholar

Llorens, H., Saquete, E. and Navarro, B. (2010). TIPSem (English and Spanish): Evaluating CRFs and semantic roles in TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-10) as part of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 284–291.Google Scholar

Luong, M.-T. Pham, H. and Manning, C. (2015). Bilingual word representation with monolingual quality in mind. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Association for Computational Linguistics (ACL), pp. 151–159.Google Scholar

Manfredi, G., Strotgen, J., Zell, J. and Gertz, M. (2014). HeidelTime at EVENTI: Tuning Italian resources and addressing TimeML’s empty tags. In Proceedings of the 1st Italian Conference on Computational Linguistics (CLiC-it) & the 4th International Workshop EVALITA, pp. 39–43.Google Scholar

Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. and McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations. Association for Computational Linguistics, pp. 55–60.CrossRef Google Scholar

McCallum, A. (2002). Available at http://mallet.cs.umass.edu (accessed 16 July 2013).Google Scholar

Minard, A.-L., Speranza, M., Urizar, R., Altuna, B., van Erp, M., Schoen, A. and van Son, C. (2016). MEANTIME, the NewsReader Multilingual Event and Time Corpus. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC). European Languages Resources Association, pp. 4417–4422.Google Scholar

Mirroshandel, S.A. and Ghassem-Sani, G. (2012) Towards unsupervised learning of temporal relations between events. Journal of Artificial Intelligence Research 45, 125–163. doi: 10.1613/jair.3693.CrossRef Google Scholar

Mirroshandel, S.A., Ghassem-Sani, G. and Khayyamian, M. (2011). Using syntactic-based kernels for classifying temporal relations. Journal of Computer Science and Technology 26(1), 68–80. doi: 10.1007/s11390-011-9416-7.CrossRef Google Scholar

Mirza, P. and Minard, A.-L. (2014). FBK-HLT-time: a Complete Italian Temporal Processing System for EVENTI-Evalita 2014. In Proceedings of the 1st Italian Conference on Computational Linguistics (CLiC-it) & the 4th International Workshop EVALITA, pp. 44–49.Google Scholar

Mirza, P. and Tonelli, S. (2014). Classifying temporal relations with simple features. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Association for Computational Linguistics, pp. 308–317.Google Scholar

Moriceau, V. and Tannier, X. (2014). French resources for extraction and normalization of temporal expressions with HeidelTime. In 9th International Conference on Language Resources and Evaluation (LREC). The European Language Resources Association, pp. 3239–3243.Google Scholar

Skukan, L., Glavaš, G. and Šnajder, J. (2014). Heideltime.HR: Extracting and normalizing temporal expressions in Croatian. In Proceedings of the 9th Language Technologies Conference. Department of Intelligent Systems, Jožef Stefan Institute, Ljubljana, Slovenia, pp. 99–103.Google Scholar

Spreyer, K. and Frank, A. (2008). Projection-based acquisition of a temporal labeller. In Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP), pp. 489–496.Google Scholar

Strötgen, J. and Gertz, M. (2010). HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-10) as part of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 321–324.Google Scholar

Strötgen, J. and Gertz, M. (2013). Multilingual and cross-domain temporal tagging. Language Resources and Evaluation 47(2), 269–298. doi: 10.1007/s10579-012-9179-y.CrossRef Google Scholar

Strötgen, J. and Gertz, M. (2015). A baseline temporal tagger for all languages. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 541–547.CrossRef Google Scholar

Strötgen, J. and Gertz, M. (2016). Domain-sensitive temporal tagging. Synthesis Lectures on Human Language Technologies 9(3), 1–151.CrossRef Google Scholar

Tackstrom, O., Das, D., Petrov, S., McDonald, R. and Nivre, J. (2013). Token and type constraints for cross-lingual part-of-speech tagging. Transactions of the Association for Computational Linguistics 1, 1–12.CrossRef Google Scholar

The Apache Software Foundation. (2016). Apache Lucene 6.0.0 documentation. April 7. Available at https://lucene.apache.org/core/6_0_0/index.html (accessed 26 May 2016).Google Scholar

Torbati, M., Ghassem-Sani, G., Mirroshandel, S., Yaghoobzadeh, Y. and Hosseini, N. (2013). Temporal relation classification in Persian and English contexts. In Proceedings of the Recent Advances in Natural Language Processing (RANLP), pp. 261–269.Google Scholar

UzZaman, N., Llorens, H., Allen, J., Derczynski, L., Verhagen, M. and Pustejovsky, J. (2013). SemEval-2013 Task 1: TempEval-3: Evaluating events, time expressions and temporal relations. In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval-13) as part of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 1–9.Google Scholar

Verhagen, M., Gaizauskas, R., Schilder, F. and Pustejovsky, J. (2009). The TempEval challenge: identifying temporal relations in text. Language Resources and Evaluation 43(2), 161–179. doi: 10.1007/s10579-009-9086-z.CrossRef Google Scholar

Verhagen, M., Saurí, R., Caselli, T. and Pustejovsky, J. (2010). SemEval-2010 Task 13: TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-10) as part of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 57–62.Google Scholar

Wang, M. and Manning, C. (2014). Cross-lingual projected expectation regularization for weakly supervised learning. Transactions of the Association for Computational Linguistics 2, 55–66.CrossRef Google Scholar

Yarowski, D. and Ngai, G. (2001). Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL). Association for Computational Linguistics.Google Scholar

Yarowsky, D., Ngai, G. and Wicentowski, R. (2001). Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st International Conference on Human Language Technology Research. Association for Computational Linguistics, pp. 1–8.Google Scholar

Yoshikawa, K., Riedel, S., Asahara, M. and Matsumoto, Y. (2009). Jointly identifying temporal relations with Markov logic. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th International Joint Conference on Natural Language Processing (IJCNLP) of the Asian Federation of Natural Language Processing (AFNLP). Association of Computational Linguistics and Asian Federation of Natural Language Processing, pp. 405–413.Google Scholar

Zennaki, O., Semmar, N. and Besacier, L. (2016). Inducing multilingual text analysis tools using bidirectional recurrent neural networks. In Proceedings of the 26th International Conference on Computational Linguistics (COLING). The Association for Computational Linguistics (ACL), pp. 450–460.Google Scholar

Article contents

Annotation projection for temporal information extraction

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests