Abstract
One of the early application of Information Extraction, motivated by the needs for intelligence tools, is the detection of events in news articles. But this detection may be difficult when news articles mention several occurrences of events of the same kind, which is often done for comparison purposes. We propose in this article new approaches to segment the text of news articles in units relative to only one event, in order to help the identification of relevant information associated with the main event of the news. We present two approaches that use statistical machine learning models (HMM and CRF) exploiting temporal information extracted from the texts as a basis for this segmentation. The evaluation of these approaches in the domain of seismic events show that with a robust and generic approach, we can achieve results at least as good as results obtained with a specialized heuristic approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Besançon, R., de Chalendar, G., Ferret, O., Gara, F., Laib, M., Semmar, N.: LIMA: A Multilingual Framework for Linguistic Analysis and Linguistic Resources Development and Evaluation. In: 7th Conference on Language Resources and Evaluation (LREC 2010), Malta (2010)
Bestgen, Y., Vonk, W.: Temporal Adverbials as Segmentation Markers in Discourse Comprehension. Journal of Memory and Language 42(1), 74–87 (2000)
Crowe, J.: Constraint-based Event Recognition for Information Extraction. In: 33rd Annual Meeting of the Association for Computational Linguistics (ACL 1995), Cambridge, Massachusetts, USA, pp. 296–298 (1995)
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The Automatic Content Extraction (ACE) Program–Tasks, Data, and Evaluation. In: 4th Conference on Language Resources and Evaluation (LREC 2004), pp. 837–840 (2004)
Filatova, E., Hovy, E.: Assigning Time-Stamps to Event-Clauses. In: ACL Workshop on Temporal and Spatial Information Processing, France, pp. 1–8 (2001)
Grishman, R., Sundheim, B.: Message Understanding Conference-6: A Brief History. In: 16th International Conference on Computational linguistics (COLING 1996), Copenhagen, Denmark, pp. 466–471 (1996)
Gupta, P., Ji, H.: Predicting Unknown Time Arguments Based on Cross-Event Propagation. In: ACL-IJCNLP 2009 (short papers), Singapore, pp. 369–372 (2009)
Harabagiu, S.: Incremental Topic Representations. In: 20th International Conference on Computational Linguistics (COLING 2004), Switzerland, pp. 583–589 (2004)
Hirohata, K., Okazaki, N., Ananiadou, S., Ishizuka, M.: Identifying Sections in Scientific Abstracts using Conditional Random Fields. In: Third International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India, pp. 381–388 (2008)
Ho-Dac, L.M., Péry-Woodley, M.P.: Temporal adverbials and discourse segmentation revisited. In: 7th International Workshop on Multidisciplinary Approaches to Discourse 2008 (MAD 2008) - Linearisation and Segmentation in Discourse, Lysebu, Oslo, Norway, pp. 65–77 (2008)
Iwańska, L., Appelt, D., Ayuso, D., Dahlgren, K., Stalls, B.G., Grishman, R., Krupka, G., Montgomery, C., Riloff, E.: Computational aspects of discourse in the context of MUC-3. In: DARPA (ed.). MUC-3, pp. 256–282 (1991)
Kitani, T., Eriguchi, Y., Hara, M.: Pattern Matching and Discourse Processing in Information Extraction from Japanese Text. Journal of Artificial Intelligence Research 2, 89–110 (1994)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Eighteenth International Conference on Machine Learning (ICML 2001), USA, pp. 282–289 (2001)
Laporte, E., Nakamura, T., Voyatzi, S.: A French Corpus Annotated for Multiword Expressions with Adverbial Function. In: 6th Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Maroc, pp. 48–51 (2008)
Lucas, N.: The enunciative structure of news dispatches, a contrastive rhetorical approach. In: Language, Culture, Rhetoric, ASLA (Association Suédoise de Linguistique Appliquée), pp. 159–164 (2005)
Mani, I., Verhagen, M., Wellner, B., Lee, C.M., Pustejovsky, J.: Machine Learning of Temporal Relations. In: 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006), Sydney, Australia, pp. 753–760 (2006)
Naughton, M.: Exploiting Structure for Event Discovery Using the MDI Algorithm. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, pp. 31–36 (2007)
Pustejovsky, J., Knippen, R., Littman, J., Sauri, R.: Temporal and Event Information in Natural Language Text. Computers and the Humanities 39(2-3), 123–164 (2005)
Rabiner, L.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Readings in Speech Recognition, 267–290 (1989)
Ratnaparkhi, A.: Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. thesis, Philadelphia, PA, USA (1998)
Soderland, S., Lehnert, W.: Wrap-Up: a Trainable Discourse Module for Information Extraction. Journal of Artificial Intelligence Research 2, 131–158 (1994)
Stevenson, M.: Fact distribution in Information Extraction. Language Resources and Evaluation 40(2), 183–201 (2006)
Surdeanu, M., Harabagiu, S.M.: Infrastructure for Open-Domain Information Extraction. In: Second International Conference on Human Language Technology Research (HLT 2002), San Diego, California, pp. 325–330 (2002)
Yamron, J.P., Carp, I., Gillick, L., Lowe, S., van Mulbregt, P.V.: A Hidden Markov Model Approach to Text Segmentation and Event Tracking. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 333–336 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jean-Louis, L., Besançon, R., Ferret, O. (2010). Using Temporal Cues for Segmenting Texts into Events. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds) Advances in Natural Language Processing. NLP 2010. Lecture Notes in Computer Science(), vol 6233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14770-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-14770-8_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14769-2
Online ISBN: 978-3-642-14770-8
eBook Packages: Computer ScienceComputer Science (R0)