skip to main content
10.1145/2232817.2232859acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Event-centric search and exploration in document collections

Published: 10 June 2012 Publication History

Abstract

Textual data ranging from corpora of digitized historic documents to large collections of news feeds provide a rich source for temporal and geographic information. Such types of information have recently gained a lot of interest in support of different search and exploration tasks, e.g., by organizing news along a timeline or placing the origin of documents on a map. However, for this, temporal and geographic information embedded in documents is often considered in isolation. We claim that through combining such information into (chronologically ordered) event-like features interesting and meaningful search and exploration tasks are possible. In this paper, we present a framework for the extraction, exploration, and visualization of event information in document collections. For this, one has to identify and combine temporal and geographic expressions from documents, thus enriching a document collection by a set of normalized events. Traditional search queries then can be enriched by conditions on the events relevant to the search subject. Most important for our event-centric approach is that a search result consists of a sequence of events relevant to the search terms and not just a document hit-list. Such events can originate from different documents and can be further explored, in particular events relevant to a search query can be ordered chronologically. We demonstrate the utility of our framework by different (multilingual) search and exploration scenarios using a Wikipedia corpus.

References

[1]
D. Ahn. The Stages of Event Extraction. In Proc. of the Workshop on Annotating and Reasoning about Time and Events, pages 1--8, 2006.
[2]
O. Alonso, M. Gertz, and R. Baeza-Yates. On the Value of Temporal Information in Information Retrieval. SIGIR Forum, 41(2):35--41, 2007.
[3]
O. Alonso, J. Strotgen, R. Baeza-Yates, and M. Gertz. Temporal Information Retrieval: Challenges and Opportunities. In Proceedings of the 1st International Temporal Web Analytics Workshop, pages 1--8, 2011.
[4]
Y.-F. R. Chen, G. Di Fabbrizio, D. Gibbon, S. Jora, B. Renger, and B. Wei. GeoTracker: Geospatial and Temporal RSS Navigation. In WWW '07, pages 41--50, 2007.
[5]
H. L. Chieu and Y. K. Lee. Query based Event Extraction along a Timeline. In SIGIR '04, pages 425--432, 2004.
[6]
F. Gey, R. Larson, N. Kando, J. Machado, and T. Sakai. NTCIR-GeoTime Overview: Evaluating Geographic and Temporal Search. In Proceedings of NTCIR-8, 2010.
[7]
GuTime. http://timeml.org/site/tarsqi/modules/gutime.
[8]
J. L. Leidner, G. Sinclair, and B. Webber. Grounding Spatial Named Entities for Information Extraction and Question Answering. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 31--38, 2003.
[9]
H. Li, R. K. Srihari, C. Niu, and W. Li. Location Normalization for Information Extraction. In COLING'02, pages 1--7, 2002.
[10]
S. Liao and R. Grishman. Using Document Level Cross-Event Inference to Improve Event Extraction. In ACL'10, pages 789--797, 2010.
[11]
M. D. Lieberman, H. Samet, J. Sankaranarayanan, and J. Sperling. STEWARD: Architecture of a Spatio-textual Search Engine. In GIS '07, pages 186--193, 2007.
[12]
Lucene. http://lucene.apache.org/.
[13]
I. Mani, J. Pustejovsky, and R. Gaizauskas, editors. The Language of Time. Oxford University Press, 2005.
[14]
C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[15]
B. Martins, H. Manguinhas, and J. Borbinha. Extracting and Exploring the Geo-Temporal Semantics of Textual Resources. In ICSC'08, 2008.
[16]
F. Mata and C. Claramunt. GeoST: Geographic, Thematic and Temporal Information Retrieval from Heterogeneous Web Data Sources. In W2GIS'11, pages 5--20, 2011.
[17]
P. Mazur and R. Dale. WikiWars: A New Corpus for Research on Temporal Expressions. In EMNLP'10, pages 913--922, 2010.
[18]
MetaCarta. http://www.metacarta.com/.
[19]
OpenNLP. http://opennlp.sourceforge.net/.
[20]
V. Petras, R. R. Larson, and M. Buckland. Time Period Directories: A Metadata Infrastructure for Placing Events in Temporal and Geographic Context. In JCDL'06, pages 151--160, 2006.
[21]
D. Pfoser, A. Efentakis, T. Hadzilacos, S. Karagiorgou, and G. Vasiliou. Providing Universal Access to History Textbooks: A Modified GIS Case. In W2GIS'11, pages 87--102, 2009.
[22]
R. Purves, P. Clough, and C. Jones, editors. GIR '10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, 2010.
[23]
J. Pustejovsky, R. Knippen, J. Littman, and R. Sauri. Temporal and Event Information in Natural Language Text. Language Resources and Evaluation, 39(2-3):123--164, 2005.
[24]
G. Quercini, H. Samet, J. Sankaranarayanan, and M. D. Lieberman. Determining the Spatial Reader Scopes of News Sources using Local Lexicons. In GIS'10, pages 43--52, 2010.
[25]
J. Strotgen and M. Gertz. HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions. In SemEval'10, pages 321--324, 2010.
[26]
J. Strotgen and M. Gertz. TimeTrails: A System for Exploring Spatio-Temporal Information in Documents. In VLDB'10, pages 1569--1572, 2010.
[27]
J. Strotgen and M. Gertz. WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions. In GSCL'11, pages 129--134, 2011.
[28]
J. Strotgen and M. Gertz. Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, accepted for journal publication, 2012.
[29]
J. Strotgen, M. Gertz, and C. Junghans. An Event-centric Model for Multilingual Document Similarity. In SIGIR'11, pages 953--962, 2011.
[30]
M. Verhagen, R. Sauri, T. Caselli, and J. Pustejovsky. SemEval-2010 Task 13: TempEval-2. In SemEval'10, pages 57--62, 2010.
[31]
Y. Wang, B. Yang, S. Zoupanos, M. Spaniol, and G. Weikum. Scalable Spatio-temporal Knowledge Harvesting. In WWW'11, pages 143--144, 2011.
[32]
Wikipedia Featured Articles. http://en.wikipedia.org/wiki/Wikipedia:FA.
[33]
Yahoo! Placemaker. http://developer.yahoo.com/geo/placemaker/.
[34]
M. Yamamoto, Y. Takahashi, H. Iwasaki, S. Oyama, H. Ohshima, and K. Tanaka. Extraction and Geographical Navigation of Important Historical Events in the Web. In W2GIS'11, pages 21--35, 2011.

Cited By

View all
  • (2023)A survey on narrative extraction from textual dataArtificial Intelligence Review10.1007/s10462-022-10338-756:8(8393-8435)Online publication date: 6-Jan-2023
  • (2021)Event Occurrence Date Estimation based on Multivariate Time Series Analysis over Temporal Document CollectionsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462885(398-407)Online publication date: 11-Jul-2021
  • (2020)Temporal Information AccessEvaluating Information Retrieval and Access Tasks10.1007/978-981-15-5554-1_9(127-141)Online publication date: 2-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '12: Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
June 2012
458 pages
ISBN:9781450311540
DOI:10.1145/2232817
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. corpora exploration
  2. event extraction
  3. geographic information
  4. querying
  5. temporal information

Qualifiers

  • Research-article

Conference

JCDL '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A survey on narrative extraction from textual dataArtificial Intelligence Review10.1007/s10462-022-10338-756:8(8393-8435)Online publication date: 6-Jan-2023
  • (2021)Event Occurrence Date Estimation based on Multivariate Time Series Analysis over Temporal Document CollectionsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462885(398-407)Online publication date: 11-Jul-2021
  • (2020)Temporal Information AccessEvaluating Information Retrieval and Access Tasks10.1007/978-981-15-5554-1_9(127-141)Online publication date: 2-Sep-2020
  • (2019)$$\hbox {NE}^2$$NE2Knowledge and Information Systems10.1007/s10115-018-1208-859:2(311-335)Online publication date: 1-May-2019
  • (2018)JIMProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271681(637-646)Online publication date: 17-Oct-2018
  • (2018)Extraction of spatio‐temporal data about historical events from text documentsTransactions in GIS10.1111/tgis.1244822:3(677-696)Online publication date: 17-Aug-2018
  • (2017)Towards building a knowledge base of monetary transactions from a news collectionProceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries10.5555/3200334.3200357(209-218)Online publication date: 19-Jun-2017
  • (2017)Modeling the Influence of Popular Trending Events on User Search BehaviorProceedings of the 26th International Conference on World Wide Web Companion10.1145/3041021.3054188(535-544)Online publication date: 3-Apr-2017
  • (2017)Towards Building a Knowledge Base of Monetary Transactions from a News Collection2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)10.1109/JCDL.2017.7991575(1-10)Online publication date: Jun-2017
  • (2016)Domain-Sensitive Temporal TaggingSynthesis Lectures on Human Language Technologies10.2200/S00721ED1V01Y201606HLT0369:3(1-151)Online publication date: 28-Jul-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media