research-article

Important Events in the Past, Present, and Future

Authors:
Abdalghani Abujabal

Max Planck Institute for Informatics, Saarbruecken, Germany

Max Planck Institute for Informatics, Saarbruecken, Germany
View Profile

,
Klaus Berberich

Max Planck Institute for Informatics, Saarbruecken, Germany

Max Planck Institute for Informatics, Saarbruecken, Germany
View Profile

WWW '15 Companion: Proceedings of the 24th International Conference on World Wide WebMay 2015Pages 1315–1320https://doi.org/10.1145/2740908.2741692

Published:18 May 2015Publication History

WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Pages 1315–1320

ABSTRACT

We address the problem of identifying important events in the past, present, and future from semantically-annotated large-scale document collections. Semantic annotations that we consider are named entities (e.g., persons, locations, organizations) and temporal expressions (e.g., during the 1990s). More specifically, for a given time period of interest, our objective is to identify, rank, and describe important events that happened. Our approach P2F Miner makes use of frequent itemset mining to identify events and group sentences related to them. It uses an information-theoretic measure to rank identified events. For each of them, it selects a representative sentence as a description. Experiments on ClueWeb09 using events listed in Wikipedia year articles as ground truth show that our approach is effective and outperforms a baseline based on statistical language models.

References

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 5--14. ACM, 2009. Google ScholarDigital Library
R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD Conference, pages 207--216, 1993. Google ScholarDigital Library
J. Allan. Introduction to topic detection and tracking. In J. Allan, editor, Topic Detection and Tracking, volume 12 of The Information Retrieval Series, pages 1--16. Springer US, 2002. Google ScholarDigital Library
O. Alonso, K. Berberich, S. Bedathur, and G. Weikum. Time-based exploration of news archives. In Proceedings of Workshop on Human-Computer Interaction and Information Retrieval (HCIR), 2010.Google Scholar
O. Alonso, M. Gertz, and R. Baeza-Yates. On the value of temporal information in information retrieval. In ACM SIGIR Forum, volume 41, pages 35--41. ACM, 2007. Google ScholarDigital Library
A. Anand, S. J. Bedathur, K. Berberich, and R. Schenkel. Index maintenance for time-travel text search. In W. R. Hersh, J. Callan, Y. Maarek, and M. Sanderson, editors, The 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR '12, Portland, OR, USA, August 12--16, 2012, pages 235--244. ACM, 2012. Google ScholarDigital Library
I. Arikan, S. J. Bedathur, and K. Berberich. Time will tell: Leveraging temporal expressions in IR. In R. A. Baeza-Yates, P. Boldi, B. A. Ribeiro-Neto, and B. B. Cambazoglu, editors, Proceedings of the Second International Conference on Web Search and Web Data Mining, WSDM 2009, Barcelona, Spain, February 9--11, 2009. ACM, 2009.Google Scholar
C.-m. Au Yeung and A. Jatowt. Studying how the past is remembered: towards computational history through large scale text mining. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1231--1240. ACM, 2011. Google ScholarDigital Library
R. Baeza-Yates. Searching the Future. 2005.Google Scholar
K. Berberich, S. Bedathur, O. Alonso, and G. Weikum. A language modeling approach for temporal information needs. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Rüger, and K. van Rijsbergen, editors, Advances in Information Retrieval, volume 5993 of Lecture Notes in Computer Science, pages 13--25. Springer Berlin / Heidelberg, 2010. Google ScholarDigital Library
K. Berberich, S. J. Bedathur, and G. Weikum. Efficient time-travel on versioned text collections. In A. Kemper, H. Schöning, T. Rose, M. Jarke, T. Seidl, C. Quix, and C. Brochhaus, editors, Datenbanksysteme in Business, Technologie und Web (BTW 2007), 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), Proceedings, 7.-9. Marz 2007, Aachen, Germany, volume 103 of LNI, pages 44--63. GI, 2007.Google Scholar
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247--1250. ACM, 2008. Google ScholarDigital Library
D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu. Mafia: A maximal frequent itemset algorithm. IEEE Trans. on Knowl. and Data Eng., 17:1490--1504, November 2005. Google ScholarDigital Library
A. X. Chang and C. D. Manning. Sutime: A library for recognizing and normalizing time expressions. In In LREC, 2012.Google Scholar
C. Cieri, S. Strassel, D. Graff, N. Martey, K. Rennert, and M. Liberman. Corpora for topic detection and tracking. In Topic detection and tracking, pages 33--66. Springer, 2002. Google ScholarDigital Library
C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 659--666. ACM, 2008. Google ScholarDigital Library
T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley & Sons, 2012.Google ScholarDigital Library
ClueWeb09 http://lemurproject.org/clueweb09/.Google Scholar
J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In S. Geva, A. Trotman, P. Bruza, C. L. A. Clarke, and K. Jarvelin, editors, The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '14, Gold Coast, QLD, Australia - July 06 - 11, 2014, pages 365--374. ACM, 2014. Google ScholarDigital Library
E. Gabrilovich, M. Ringgaard, and A. Subramanya. Facc1: Freebase annotation of clueweb corpora, version 1 (release date 2013-06--26, format version 1, correction level 0), 2013.Google Scholar
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In W. Chen, J. F. Naughton, and P. A. Bernstein, editors, Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16--18, 2000, Dallas, Texas, USA., pages 1--12. ACM, 2000. Google ScholarDigital Library
J. He, J. Zeng, and T. Suel. Improved index compression techniques for versioned document collections. In J. Huang, N. Koudas, G. J. F. Jones, X. Wu, K. Collins-Thompson, and A. An, editors, Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26--30, 2010, pages 1239--1248. ACM, 2010. Google ScholarDigital Library
J. Hoffart, D. Milchevski, and G. Weikum. STICS: searching with strings, things, and cats. In S. Geva, A. Trotman, P. Bruza, C. L. A. Clarke, and K. Jarvelin, editors, The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '14, Gold Coast, QLD, Australia - July 06 - 11, 2014, pages 1247--1248. ACM, 2014. Google ScholarDigital Library
J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell., 194:28--61, 2013. Google ScholarDigital Library
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proceedings of the EMNLP, pages 782--792. Association for Computational Linguistics, 2011. Google ScholarDigital Library
A. Jatowt and C.-m. Au Yeung. Extracting collective expectations about the future from large text collections. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1259--1264. ACM, 2011. Google ScholarDigital Library
E. Kuzey, J. Vreeken, and G. Weikum. A fresh look on knowledge bases: Distilling named events from news. In J. Li, X. S. Wang, M. N. Garofalakis, I. Soboroff, T. Suel, and M. Wang, editors, Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3--7, 2014, pages 1689--1698. ACM, 2014. Google ScholarDigital Library
M.-H. Peetz, E. Meij, and M. de Rijke. Using temporal bursts for query modeling. Inf. Retr., 17(1):74--108, 2014. Google ScholarDigital Library
S. Sarawagi. Information extraction. Foundations and trends in databases, 1(3):261--377, 2008. Google ScholarDigital Library
R. Swan and J. Allan. Automatic generation of overview timelines. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 49--56. ACM, 2000. Google ScholarDigital Library
R. Swan and D. Jensen. Timemines: Constructing timelines with statistical models of word usage. In KDD-2000 Workshop on Text Mining, pages 73--80, 2000.Google Scholar
P. P. Talukdar, D. T. Wijaya, and T. M. Mitchell. Coupled temporal scoping of relational facts. In E. Adar, J. Teevan, E. Agichtein, and Y. Maarek, editors, Proceedings of the Fifth International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, WA, USA, February 8--12, 2012, pages 73--82. ACM, 2012. Google ScholarDigital Library
M. Verhagen, I. Mani, R. Sauri, J. Littman, R. Knippen, S. B. Jang, A. Rumshisky, J. Phillips, and J. Pustejovsky. Automating temporal annotation with tarsqi. In ACL, 2005. Google ScholarDigital Library
Y. Wang, M. Dylla, M. Spaniol, and G. Weikum. Coupling label propagation and constraints for temporal fact extraction. In The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8--14, 2012, Jeju Island, Korea - Volume 2: Short Papers, pages 233--237. The Association for Computer Linguistics, 2012. Google ScholarDigital Library
M. J. Zaki and J. Wagner Meira. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, May 2014. Google ScholarDigital Library
C. Zhai. Statistical language models for information retrieval a critical review. Found. Trends Inf. Retr., 2:137--213, March 2008. Google ScholarDigital Library

Index Terms

Important Events in the Past, Present, and Future
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Learning, detection and representation of multi-agent events in videos

In this paper, we model multi-agent events in terms of a temporally varying sequence of sub-events, and propose a novel approach for learning, detecting and representing events in videos. The proposed approach has three main steps. First, in order to ...
Read More
Detection of user-defined, semantically high-level, composite events, and retrieval of event queries

Detecting events of interest from video sequences, and searching and retrieving events from video databases are important and challenging problems. Event of interest is a very general term, since events of interest can vary significantly among different ...
Read More
SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news
Abstract
We present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available ... $_{}$
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web
May 2015
1602 pages
ISBN:9781450334730
DOI:10.1145/2740908
General Chairs:
Aldo Gangemi
National Research Council, Italy & Paris 13 University-CNRS, France
,
Stefano Leonardi
Sapienza University of Rome, Italy
,
Alessandro Panconesi
Sapienza University of Rome, Italy
Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2)
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 May 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
event detection
temporal information retrieval
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 235
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Important Events in the Past, Present, and Future

WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning, detection and representation of multi-agent events in videos

Detection of user-defined, semantically high-level, composite events, and retrieval of event queries

SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news