skip to main content
10.1145/2740908.2741692acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Important Events in the Past, Present, and Future

Published:18 May 2015Publication History

ABSTRACT

We address the problem of identifying important events in the past, present, and future from semantically-annotated large-scale document collections. Semantic annotations that we consider are named entities (e.g., persons, locations, organizations) and temporal expressions (e.g., during the 1990s). More specifically, for a given time period of interest, our objective is to identify, rank, and describe important events that happened. Our approach P2F Miner makes use of frequent itemset mining to identify events and group sentences related to them. It uses an information-theoretic measure to rank identified events. For each of them, it selects a representative sentence as a description. Experiments on ClueWeb09 using events listed in Wikipedia year articles as ground truth show that our approach is effective and outperforms a baseline based on statistical language models.

References

  1. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 5--14. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD Conference, pages 207--216, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Allan. Introduction to topic detection and tracking. In J. Allan, editor, Topic Detection and Tracking, volume 12 of The Information Retrieval Series, pages 1--16. Springer US, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Alonso, K. Berberich, S. Bedathur, and G. Weikum. Time-based exploration of news archives. In Proceedings of Workshop on Human-Computer Interaction and Information Retrieval (HCIR), 2010.Google ScholarGoogle Scholar
  5. O. Alonso, M. Gertz, and R. Baeza-Yates. On the value of temporal information in information retrieval. In ACM SIGIR Forum, volume 41, pages 35--41. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Anand, S. J. Bedathur, K. Berberich, and R. Schenkel. Index maintenance for time-travel text search. In W. R. Hersh, J. Callan, Y. Maarek, and M. Sanderson, editors, The 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR '12, Portland, OR, USA, August 12--16, 2012, pages 235--244. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. Arikan, S. J. Bedathur, and K. Berberich. Time will tell: Leveraging temporal expressions in IR. In R. A. Baeza-Yates, P. Boldi, B. A. Ribeiro-Neto, and B. B. Cambazoglu, editors, Proceedings of the Second International Conference on Web Search and Web Data Mining, WSDM 2009, Barcelona, Spain, February 9--11, 2009. ACM, 2009.Google ScholarGoogle Scholar
  8. C.-m. Au Yeung and A. Jatowt. Studying how the past is remembered: towards computational history through large scale text mining. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1231--1240. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Baeza-Yates. Searching the Future. 2005.Google ScholarGoogle Scholar
  10. K. Berberich, S. Bedathur, O. Alonso, and G. Weikum. A language modeling approach for temporal information needs. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Rüger, and K. van Rijsbergen, editors, Advances in Information Retrieval, volume 5993 of Lecture Notes in Computer Science, pages 13--25. Springer Berlin / Heidelberg, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Berberich, S. J. Bedathur, and G. Weikum. Efficient time-travel on versioned text collections. In A. Kemper, H. Schöning, T. Rose, M. Jarke, T. Seidl, C. Quix, and C. Brochhaus, editors, Datenbanksysteme in Business, Technologie und Web (BTW 2007), 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), Proceedings, 7.-9. Marz 2007, Aachen, Germany, volume 103 of LNI, pages 44--63. GI, 2007.Google ScholarGoogle Scholar
  12. K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247--1250. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu. Mafia: A maximal frequent itemset algorithm. IEEE Trans. on Knowl. and Data Eng., 17:1490--1504, November 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. X. Chang and C. D. Manning. Sutime: A library for recognizing and normalizing time expressions. In In LREC, 2012.Google ScholarGoogle Scholar
  15. C. Cieri, S. Strassel, D. Graff, N. Martey, K. Rennert, and M. Liberman. Corpora for topic detection and tracking. In Topic detection and tracking, pages 33--66. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 659--666. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley & Sons, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. ClueWeb09 http://lemurproject.org/clueweb09/.Google ScholarGoogle Scholar
  19. J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In S. Geva, A. Trotman, P. Bruza, C. L. A. Clarke, and K. Jarvelin, editors, The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '14, Gold Coast, QLD, Australia - July 06 - 11, 2014, pages 365--374. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Gabrilovich, M. Ringgaard, and A. Subramanya. Facc1: Freebase annotation of clueweb corpora, version 1 (release date 2013-06--26, format version 1, correction level 0), 2013.Google ScholarGoogle Scholar
  21. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In W. Chen, J. F. Naughton, and P. A. Bernstein, editors, Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16--18, 2000, Dallas, Texas, USA., pages 1--12. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. He, J. Zeng, and T. Suel. Improved index compression techniques for versioned document collections. In J. Huang, N. Koudas, G. J. F. Jones, X. Wu, K. Collins-Thompson, and A. An, editors, Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26--30, 2010, pages 1239--1248. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Hoffart, D. Milchevski, and G. Weikum. STICS: searching with strings, things, and cats. In S. Geva, A. Trotman, P. Bruza, C. L. A. Clarke, and K. Jarvelin, editors, The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '14, Gold Coast, QLD, Australia - July 06 - 11, 2014, pages 1247--1248. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell., 194:28--61, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proceedings of the EMNLP, pages 782--792. Association for Computational Linguistics, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Jatowt and C.-m. Au Yeung. Extracting collective expectations about the future from large text collections. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1259--1264. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Kuzey, J. Vreeken, and G. Weikum. A fresh look on knowledge bases: Distilling named events from news. In J. Li, X. S. Wang, M. N. Garofalakis, I. Soboroff, T. Suel, and M. Wang, editors, Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3--7, 2014, pages 1689--1698. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M.-H. Peetz, E. Meij, and M. de Rijke. Using temporal bursts for query modeling. Inf. Retr., 17(1):74--108, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Sarawagi. Information extraction. Foundations and trends in databases, 1(3):261--377, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Swan and J. Allan. Automatic generation of overview timelines. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 49--56. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Swan and D. Jensen. Timemines: Constructing timelines with statistical models of word usage. In KDD-2000 Workshop on Text Mining, pages 73--80, 2000.Google ScholarGoogle Scholar
  32. P. P. Talukdar, D. T. Wijaya, and T. M. Mitchell. Coupled temporal scoping of relational facts. In E. Adar, J. Teevan, E. Agichtein, and Y. Maarek, editors, Proceedings of the Fifth International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, WA, USA, February 8--12, 2012, pages 73--82. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Verhagen, I. Mani, R. Sauri, J. Littman, R. Knippen, S. B. Jang, A. Rumshisky, J. Phillips, and J. Pustejovsky. Automating temporal annotation with tarsqi. In ACL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Wang, M. Dylla, M. Spaniol, and G. Weikum. Coupling label propagation and constraints for temporal fact extraction. In The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8--14, 2012, Jeju Island, Korea - Volume 2: Short Papers, pages 233--237. The Association for Computer Linguistics, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. J. Zaki and J. Wagner Meira. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, May 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Zhai. Statistical language models for information retrieval a critical review. Found. Trends Inf. Retr., 2:137--213, March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Important Events in the Past, Present, and Future

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web
      May 2015
      1602 pages
      ISBN:9781450334730
      DOI:10.1145/2740908

      Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 May 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader