ABSTRACT
We address the problem of identifying important events in the past, present, and future from semantically-annotated large-scale document collections. Semantic annotations that we consider are named entities (e.g., persons, locations, organizations) and temporal expressions (e.g., during the 1990s). More specifically, for a given time period of interest, our objective is to identify, rank, and describe important events that happened. Our approach P2F Miner makes use of frequent itemset mining to identify events and group sentences related to them. It uses an information-theoretic measure to rank identified events. For each of them, it selects a representative sentence as a description. Experiments on ClueWeb09 using events listed in Wikipedia year articles as ground truth show that our approach is effective and outperforms a baseline based on statistical language models.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 5--14. ACM, 2009. Google ScholarDigital Library
- R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD Conference, pages 207--216, 1993. Google ScholarDigital Library
- J. Allan. Introduction to topic detection and tracking. In J. Allan, editor, Topic Detection and Tracking, volume 12 of The Information Retrieval Series, pages 1--16. Springer US, 2002. Google ScholarDigital Library
- O. Alonso, K. Berberich, S. Bedathur, and G. Weikum. Time-based exploration of news archives. In Proceedings of Workshop on Human-Computer Interaction and Information Retrieval (HCIR), 2010.Google Scholar
- O. Alonso, M. Gertz, and R. Baeza-Yates. On the value of temporal information in information retrieval. In ACM SIGIR Forum, volume 41, pages 35--41. ACM, 2007. Google ScholarDigital Library
- A. Anand, S. J. Bedathur, K. Berberich, and R. Schenkel. Index maintenance for time-travel text search. In W. R. Hersh, J. Callan, Y. Maarek, and M. Sanderson, editors, The 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR '12, Portland, OR, USA, August 12--16, 2012, pages 235--244. ACM, 2012. Google ScholarDigital Library
- I. Arikan, S. J. Bedathur, and K. Berberich. Time will tell: Leveraging temporal expressions in IR. In R. A. Baeza-Yates, P. Boldi, B. A. Ribeiro-Neto, and B. B. Cambazoglu, editors, Proceedings of the Second International Conference on Web Search and Web Data Mining, WSDM 2009, Barcelona, Spain, February 9--11, 2009. ACM, 2009.Google Scholar
- C.-m. Au Yeung and A. Jatowt. Studying how the past is remembered: towards computational history through large scale text mining. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1231--1240. ACM, 2011. Google ScholarDigital Library
- R. Baeza-Yates. Searching the Future. 2005.Google Scholar
- K. Berberich, S. Bedathur, O. Alonso, and G. Weikum. A language modeling approach for temporal information needs. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Rüger, and K. van Rijsbergen, editors, Advances in Information Retrieval, volume 5993 of Lecture Notes in Computer Science, pages 13--25. Springer Berlin / Heidelberg, 2010. Google ScholarDigital Library
- K. Berberich, S. J. Bedathur, and G. Weikum. Efficient time-travel on versioned text collections. In A. Kemper, H. Schöning, T. Rose, M. Jarke, T. Seidl, C. Quix, and C. Brochhaus, editors, Datenbanksysteme in Business, Technologie und Web (BTW 2007), 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), Proceedings, 7.-9. Marz 2007, Aachen, Germany, volume 103 of LNI, pages 44--63. GI, 2007.Google Scholar
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247--1250. ACM, 2008. Google ScholarDigital Library
- D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu. Mafia: A maximal frequent itemset algorithm. IEEE Trans. on Knowl. and Data Eng., 17:1490--1504, November 2005. Google ScholarDigital Library
- A. X. Chang and C. D. Manning. Sutime: A library for recognizing and normalizing time expressions. In In LREC, 2012.Google Scholar
- C. Cieri, S. Strassel, D. Graff, N. Martey, K. Rennert, and M. Liberman. Corpora for topic detection and tracking. In Topic detection and tracking, pages 33--66. Springer, 2002. Google ScholarDigital Library
- C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 659--666. ACM, 2008. Google ScholarDigital Library
- T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley & Sons, 2012.Google ScholarDigital Library
- ClueWeb09 http://lemurproject.org/clueweb09/.Google Scholar
- J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In S. Geva, A. Trotman, P. Bruza, C. L. A. Clarke, and K. Jarvelin, editors, The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '14, Gold Coast, QLD, Australia - July 06 - 11, 2014, pages 365--374. ACM, 2014. Google ScholarDigital Library
- E. Gabrilovich, M. Ringgaard, and A. Subramanya. Facc1: Freebase annotation of clueweb corpora, version 1 (release date 2013-06--26, format version 1, correction level 0), 2013.Google Scholar
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In W. Chen, J. F. Naughton, and P. A. Bernstein, editors, Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16--18, 2000, Dallas, Texas, USA., pages 1--12. ACM, 2000. Google ScholarDigital Library
- J. He, J. Zeng, and T. Suel. Improved index compression techniques for versioned document collections. In J. Huang, N. Koudas, G. J. F. Jones, X. Wu, K. Collins-Thompson, and A. An, editors, Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26--30, 2010, pages 1239--1248. ACM, 2010. Google ScholarDigital Library
- J. Hoffart, D. Milchevski, and G. Weikum. STICS: searching with strings, things, and cats. In S. Geva, A. Trotman, P. Bruza, C. L. A. Clarke, and K. Jarvelin, editors, The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '14, Gold Coast, QLD, Australia - July 06 - 11, 2014, pages 1247--1248. ACM, 2014. Google ScholarDigital Library
- J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell., 194:28--61, 2013. Google ScholarDigital Library
- J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proceedings of the EMNLP, pages 782--792. Association for Computational Linguistics, 2011. Google ScholarDigital Library
- A. Jatowt and C.-m. Au Yeung. Extracting collective expectations about the future from large text collections. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1259--1264. ACM, 2011. Google ScholarDigital Library
- E. Kuzey, J. Vreeken, and G. Weikum. A fresh look on knowledge bases: Distilling named events from news. In J. Li, X. S. Wang, M. N. Garofalakis, I. Soboroff, T. Suel, and M. Wang, editors, Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3--7, 2014, pages 1689--1698. ACM, 2014. Google ScholarDigital Library
- M.-H. Peetz, E. Meij, and M. de Rijke. Using temporal bursts for query modeling. Inf. Retr., 17(1):74--108, 2014. Google ScholarDigital Library
- S. Sarawagi. Information extraction. Foundations and trends in databases, 1(3):261--377, 2008. Google ScholarDigital Library
- R. Swan and J. Allan. Automatic generation of overview timelines. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 49--56. ACM, 2000. Google ScholarDigital Library
- R. Swan and D. Jensen. Timemines: Constructing timelines with statistical models of word usage. In KDD-2000 Workshop on Text Mining, pages 73--80, 2000.Google Scholar
- P. P. Talukdar, D. T. Wijaya, and T. M. Mitchell. Coupled temporal scoping of relational facts. In E. Adar, J. Teevan, E. Agichtein, and Y. Maarek, editors, Proceedings of the Fifth International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, WA, USA, February 8--12, 2012, pages 73--82. ACM, 2012. Google ScholarDigital Library
- M. Verhagen, I. Mani, R. Sauri, J. Littman, R. Knippen, S. B. Jang, A. Rumshisky, J. Phillips, and J. Pustejovsky. Automating temporal annotation with tarsqi. In ACL, 2005. Google ScholarDigital Library
- Y. Wang, M. Dylla, M. Spaniol, and G. Weikum. Coupling label propagation and constraints for temporal fact extraction. In The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8--14, 2012, Jeju Island, Korea - Volume 2: Short Papers, pages 233--237. The Association for Computer Linguistics, 2012. Google ScholarDigital Library
- M. J. Zaki and J. Wagner Meira. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, May 2014. Google ScholarDigital Library
- C. Zhai. Statistical language models for information retrieval a critical review. Found. Trends Inf. Retr., 2:137--213, March 2008. Google ScholarDigital Library
Index Terms
- Important Events in the Past, Present, and Future
Recommendations
Learning, detection and representation of multi-agent events in videos
In this paper, we model multi-agent events in terms of a temporally varying sequence of sub-events, and propose a novel approach for learning, detecting and representing events in videos. The proposed approach has three main steps. First, in order to ...
Detection of user-defined, semantically high-level, composite events, and retrieval of event queries
Detecting events of interest from video sequences, and searching and retrieving events from video databases are important and challenging problems. Event of interest is a very general term, since events of interest can vary significantly among different ...
SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news
AbstractWe present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available ...
Comments