Abstract
Temporal Information Extraction (TIE) plays an important role in many natural language processing and database applications. Temporal slot filling (TSF) is a new and ambitious TIE task prepared for the knowledge base population (KBP2011) track of NIST Text Analysis Conference. TSF requires systems to discover temporally bound facts about entities and their attributes in order to populate a structured knowledge base. In this paper, we will provide an overview of the unique challenges of this new task and our novel approaches to address these challenges. We present challenges from three perspectives: (1) Temporal information representation: We will review the relevant linguistic semantic theories of temporal information and their limitations, motivating the need to develop a new (4-tuple) representation framework for the task. (2) Annotation acquisition: The lack of substantial labeled training data for supervised learning is a limiting factor in the design of TSF systems. Our work examines the use of multi-class logistic regression methods to improve the labeling quality of training data obtained by distant supervision. (3) Temporal information classification: Another key challenge lies in capturing relations between salient text elements separated by a long context. We develop two approaches for temporal classification and combine them through cross-document aggregation: a flat approach that uses lexical context and shallow dependency features and a structured approach that captures long syntactic contexts by using a dependency path kernel tailored for this task. Experimental results demonstrated that our annotation enhancement approach dramatically increased the speed of the training procedure (by almost 100 times), and that the flat and structured classification approaches were complementary, together yielding a state-of-the-art TSF system.
Similar content being viewed by others
Notes
We refer to events and states collectively as eventualities, following [5].
We assume throughout that eventualities as concepts correspond with situations in the physical world.
We consider the usage of a predicate of eventualities to be a mention of the eventuality for which it returns a positive truth value.
Vendler alluded to the fact that not only verbs, but adjectives and nouns may be used to express eventualities as well. Dölling gave a formal account of aspectual coercion, in which the canonical conceptualization of an event (modifier of an event) may be adjusted based on factors such as which modifier (event) is applied to it (it is applied to), as well as world knowledge.
In fact, “NONE” is ambiguous between (1) the query and slot fill are in relation slot_type, but in this context it is not explicitly related to time \(T\), and (2) the query and slot fill are not in relation slot_type, or any other relation, and (3) the query and slot fill are in relation slot_type*, which is not explicitly related to \(T\), and (4) the query and slot fill are in relation slot_type*, which is explicitly related to \(T\), but we still label \(\langle \) slot_type(query, slot fill), \(T\rangle \) as “NONE”.
References
Ahn D, Adafre SF, de Rijke M (2005) Extracting temporal information from open domain text: a comparative exploration. Digit Inform Manag 3:3–10
Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26:832–843
Amigo E, Artiles J, Li Q, Ji H (2011) An evaluation framework for aggregated temporal information extraction. In: Proceedings of SIGIR 2011 workshop on entity-oriented search
Aseervatham S, Antoniadis A, Gaussier E, Burlet M, Denneulin Y (2011) A sparse version of the ridge logistic regression for large-scale text categorization. Pattern Recognit Lett 32:101–106
Bach E (1986) The algebra of events. Linguist Philos 9:5–16
Baral C, Gelfond G, Gelfond M, Scherl RB (2005) Textual inference by combining multiple logic programming paradigms. In: Proceedings of AAAI 2005 workshop on inference for textual question answering
Bell A (1999) News stories and narratives. Oxford University Press, Oxford, pp 236–251
Bethard S, Martin JH (2007) Cu-tmp: temporal relation classification using syntactic and semantic features. In: SemEval-2007: 4th international workshop on semantic evaluations
Bethard S, Martin JH (2008) Learning semantic links from a corpus of parallel temporal and causal relations. In: Proceedings of annual meeting of the association for computational linguistics: human language technologies (ACL-HLT) vol 1(4)
Bethard S, Martin JH, Klingenstein S (2007) Finding temporal structure in text: machine learning of syntactic temporal relations. Int J Semant Comput (IJSC) 1(4):441–457
Bollacker K, Cook R, Tufts P (2008) Freebase: a shared database of structured general human knowledge. In: Proceedings of national conference on artificial intelligence
Bouguraev B, Ando RK (2005) TimeBank-driven TimeML analysis. In: Proceedings of annotating, extracting and reasoning about time and events
Bramsen P, Deshpande P, Lee YK, Barzilay R (2006) Inducing temporal graphs. In: Proceedings of conference on empirical methods in natural language processing
Bunescu RC, Mooney RJ (2005) A shortest path dependency kernel for relation extraction. In: Proceedings of the HLT and EMNLP, pp 724–731
Chambers, N, Wang, S, Jurafsky D (2007) Classifying temporal relations between events. In: Annual meeting of the association for computational linguistics (ACL). pp 173–176
Chambers N, Jurafsky D (2007) Unsupervised learning of narrative schemas and their participants. In: Proceedings the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing of the Asian federation of natural language processing (ACL-IJCNLP 2009). pp 173–176
Chambers N, Jurafsky D (2008) Jointly combining implicit constraints improves temporal ordering. In: Proceedings of empirical methods in natural language processing
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines ACM Trans. Intell Syst Technol 2(3). doi: 10.1145/1961189.1961199
Chen Z, Tamang S, Lee A, Li X, Lin W, Snover M, Artiles J, Passantino M, Ji H (2010) Cuny-blender tac-kbp 2010 entity linking and slot filling system description. In: Proceedings of the 2010 text analysis conference
Cortes C, Vapnik V (1995) Support-vector networks. In: Machine learning. pp 273–297
Davidson D (1967) The logical form of action sentences. In: Rescher N (ed) The logic of decision and action. University of Pittsburg Press, Pittsburg
De Marneffe M-C, Maccartney B, Manning CD (2006) Generating typed dependency parses from phrase structure parses. In: LREC 2006
De Marneffe M-C, Manning CD (2006) Stanford typed dependencies manual. Technical report. Department of Computer Science, Stanford University
Denis P, Muller P (2011) Predicting globally-coherent temporal structures from texts via endpoint inference and graph decomposition. In: IJCAI. pp 1788–1793
Dölling J (2013) Aspectual coercion and eventuality structure. In: Robering K. (ed) Aspects, phases and arguments: topics in the semantics of verbs. John Benjamins, Amsterdam, 113–146
Do Q, Lu W, Roth D (2012) Joint inference for event timeline construction. In: Proceedings of empirical methods for natural language processing (EMNLP2012)
Dowty D (1986) The effects of aspectual class on the temporal structure of discourse: semantics of pragmatics? Linguist Philos 9:37–61
Elhadad N, Barzilay R, McKeown K (2002) Inferring strategies for sentence ordering in multidocument summarization. JAIR 17:35–55
Fellbaum C (1998) WordNet: an electronical lexical database. The MIT Press, Cambridge
Finkel JR, Grenager T, Manning CD (2005) Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL
Gupta P, Ji H (2009) Predicting unknown time arguments based on cross-event propagation. In: Proceedings of ACL-IJCNLP 2009
Hinrichs E (1986) Temporal anaphora in discourses of English. Linguist Philos 9:63–82
Hitzeman J, Moens M, Grover C (1995) Algorithms for analysing the temporal structure of discourse. In: Proceedings of the seventh conference on European chapter of the association for computational linguistics, EACL’95. pp 253–260
Ji H, Grishman R (2008) Refining event extraction through unsupervised cross-document inference. In: Proceedings the annual meeting of the association of computational linguistics
Ji H, Grishman R, Chen Z, Gupta P (2009) Cross-document event extraction and tracking: task, evaluation. Techniques and challenges. In: Proceedings recent advances in natural language processing
Ji H, Grishman R, Dang HT (2011) An overview of the TAC2011 knowledge base population track. In: Proceedings of text analytics conference (TAC)
Kamp H (1981) A theory of truth and semantic representation. In: Paul Portner, Barbara H. Partee (eds) Formal semantics: The essential readings: Blackwell, Oxford, pp 189–222
Katz G (2000) Anti neo-Davidsonianism. In: Events as grammatical objects. CSLI Publications pp 393–416
Kingsbury P, Palmer M, (2002) From TreeBank to PropBank. In: Proceedings of the 3rd international conference on language resources and evaluation (LREC)
Lapata M, Lascarides A (2006) Learning sentence-internal temporal relations. J AI Res pp 85–117
Lascarides A, Asher N (1993) Temporal interpretation, discourse relations, and common sense entailment. Linguist Philos 16:437–493
Li Q, Anzaroot S, Lin W, Li X, Ji H (2011) Joint inference for cross-document information extraction. In: Proceedings of 20th ACM conference on information and, knowledge management (CIKM2011)
Ling X, Weld D (2010) Temporal information extraction. In Proceedings of the twenty fifth national conference on artificial intelligence
Li F, Yang Y, Xing EP (2005) From Lasso regression to feature vector machine. In: NIPS2005
Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:419–444
Mani I, Verhagen M, Wellner B, Lee CM, Pustejovsky J (2006) Machine learning of temporal relations. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics, ACL-44. pp 753–760
Mani I, Wellner B, Verhagen M, Pustejovsky J (2007) Three approaches to learning tlinks in timeml. Technical Report CS-07-268, Department of Computer Science, Brandeis University, Waltham, USA
Mcclosky D, Charniak E, Johnson M (2006) Effective self-training for parsing. In: Proceedings of N. American ACL (NAACL). pp 152–159
Mcclosky D, Manning CD (2012) Learning constraints for consistent timeline extraction. In: Proceedings of EMNLP
Mintz M, Bills S, Snow R, Jurafsky D (2009) Distant supervision for relation extraction without labeled data. In: ACL/AFNLP. pp 1003–1011
Moens M, Steedman M (1988) Temporal ontology and temporal reference. Comput Linguist 14:15–28
Ng AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: ICML
Parsons T (1990) Events in the semantics of English: a study in subatomic semantics. The MIT Press, Cambridge Massachusetts
Partee B (1973) Some structural analogies between tenses and pronouns in English. J Philos 70:601–609
Partee B (1984) Nominal and temporal anaphora. Linguist Philos 7:243–286
Pustejovsky J, Castano J, Ingria R, Sauri R, Gauzauskas R, Setzer A, Katz G (2003) TimeML: robust specification of event and temporal expression in text. In: Fifth international workshop on computational semantics, IWCS-5
Pustejovsky J, Hanks P, Sauri R, See A, Gaizauskas R, Setzer A, Radev D, Sundheim B, Day D, Ferro L, Lazo M (2003) The TIMEBANK corpus. In: Proceedings of corpus linguistics 2003, Lancaster. pp 647–656
Pustejovsky J, Verhagen M (2010) SemEval-2010 task 13: evaluating events, time expressions, and temporal relations (TempEval-2). In: Proceedings of the workshop on semantic evaluations: recent achievements and future directions. pp 112–116
Pustejovsky J, Verhagen M (2010) SemEval-2010 Task 13: evaluating events, time expressions, and temporal relations (TempEval-2). In: SemEval-2010: 5th international workshop on semantic evaluations
Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62:107–136
Riedel S, Yao L, McCallum A (2010) Modeling relations and their mentions without labeled text. In: ECML-PKDD
Reichenbach H (1947) Elements of symbolic logic. Macmillan, New york
Schlaefer N, Ko J, Betteridge J, Sautter ,G, Pathak M, Nyberg E (2007) Semantic extensions of the Ephyra QA system for TREC. In: Proceedings of TREC 2007
Schockaert S, Cock MD, Ahn D, Kerre E (2006) Supporting temporal question answering: strategies for offline data collection. In: Proceedings of 5th international workshop on inference in computational semantics (ICoS-5)
Snodgrass R (1998) Of duplicates and septuplets. Database Program Des 11:46–49
Surdeanu M, Tibshirani J, Nallapati R, Manning CD (2012) Multi-instance multi-label learning for relation extraction. In: Proceedings of EMNLP
Takamatsu S, Sato I, Nakagawa H (2012) Reducing wrong labels in distant supervision for relation extraction. In: Proceedings of ACL
Talukdar PP, Wijaya D, Mitchell T (2012) Acquiring temporal constraints between relations. In: Proceedings of CIKM
Talukdar PP, Wijaya D, Mitchell T (2012) Coupled temporal scoping of relational facts. In: Proceedings of WSDM
Tamang S, Ji H (2011) Adding smarter systems instead of human annotators: Re-ranking for slot filling system combination. In: Proceedings of CIKM2011 workshop on search and mining entity-relationship data
Tatu M, Srikanth M (2008) Experiments with reasoning for temporal relations between events. In: COLING. pp 857–864
Taylor B (1977) Tense and continuity. Linguist Philos 1:199–220
Tenny C, Pustejovsky J (2000) A History of events in linguistic theory. In: Events as grammatical objects. CSLI Publications, pp 3–38
Tibshirani R (1996) Optimizing reinsertion: regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288
Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc B 73(3):273–282
Trautwein M (2011) The time window of language, the interaction between linguistic and non-linguistic knowledge in the temporal interpretation of German and English texts. In: Language, Context, and Cognition (2). Walter de Gruyter, Berlin
Vendler Z (1967) Linguistics in philosophy. Cornell University Press, Ithaca, New York, USA
Verhagen M (2004) Times between the Line. Brandeis University, Massachusetts
Verhagen M (2005) Temporal closure in an annotation environment. Lang Resour Eval 39(2–3):211–241
Verhagen M, Gaizauskas R, Schilder F, Katz G, Pustejovsky J (2007) Semeval 2007 task 15: tempeval temporal relation identification. In: SemEval-2007: 4th international workshop on semantic evaluations
Verhagen M, Sauri R, Caselli T, Pustejovsky J (2010) Semeval-2010 task 13: tempeval 2. In: Proceedings of international workshop on semantic evaluations (SemEval 2010)
Wang Y, Yang B, Qu L, Spaniol M, Weikum G (2011) Harvesting facts from textual web sources by constrained label propagation. In: CIKM2011
Yoshikawa K, Riedel S, Asahara M, Matsumoto Y (2009) Jointly identifying temporal relations with markov logic. In: Proceedings the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP. pp 405–413
Acknowledgments
This work was supported by the U.S. Army Research Laboratory under Cooperative Agreement No. W911NF- 09-2-0053 (NS-CTA), the U.S. NSF CAREER Award under Grant IIS-0953149, the U.S. NSF EAGER Award under Grant No. IIS-1144111, the U.S. DARPA FA8750-13-2-0041—Deep Exploration and Filtering of Text (DEFT) Program and CUNY Junior Faculty Award. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ji, H., Cassidy, T., Li, Q. et al. Tackling representation, annotation and classification challenges for temporal knowledge base population. Knowl Inf Syst 41, 611–646 (2014). https://doi.org/10.1007/s10115-013-0675-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0675-1