skip to main content
10.1145/1835804.1835884acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Connecting the dots between news articles

Published:25 July 2010Publication History

ABSTRACT

The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today's society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture.

In this paper, we investigate methods for automatically connecting the dots -- providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the ongoing health-care debate.

We formalize the characteristics of a good chain and provide an efficient algorithm (with theoretical guarantees) to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate the algorithm's effectiveness in helping users understanding the news.

Skip Supplemental Material Section

Supplemental Material

kdd2010_shahaf_cdb_01.mov

mov

122.4 MB

References

  1. Copernic, http://www.copernic.com.Google ScholarGoogle Scholar
  2. Google news timeline, http://newstimeline.googlelabs.com/.Google ScholarGoogle Scholar
  3. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Computer Networks and ISDN Systems, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Choudhary, S. Mehta, A. Bagchi, and R. Balakrishnan. Towards characterization of actor evolution and interactions in news corpora. In Advances in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In KDD '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie: providing personalized newsfeeds via analysis of information novelty. In WWW '04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of in uence through a social network. In KDD '03. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Kleinberg. Authoritative sources in a hyperlinked environment, 1999.Google ScholarGoogle Scholar
  9. J. Kleinberg. Bursty and hierarchical structure in streams, 2002.Google ScholarGoogle Scholar
  10. D. D. Lewis and K. A. Knowles. Threading electronic mail: A preliminary study. Information Processing and Management, 33, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Masand, G. Linoff, and D. Waltz. Classifying news stories using memory based reasoning. In SIGIR '92, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In KDD '05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Nallapati, A. Feng, F. Peng, and J. Allan. Event threading within news topics. In CIKM '04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Niehaus and R. M. Young. A computational model of inferencing in narrative. In AAAI Spring Symposium '09, 2009.Google ScholarGoogle Scholar
  15. J. P. Rowe, S. W. McQuiggan, J. L. Robison, D. R. Marcey, and J. C. Lester. Storyeval: An empirical evaluation framework for narrative generation. In AAAI Spring Symposium '09, 2009.Google ScholarGoogle Scholar
  16. S. R. Turner. The creative process: A computer model of storytelling and creativity, 1994.Google ScholarGoogle Scholar
  17. C. Yang, X. Shi, and C. Wei. Tracing the event evolution of terror attacks from on-line news. In Intelligence and Security Informatics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Yang, T. Ault, T. Pierce, and C. Lattimer. Improving text categorization methods for event tracking. In SIGIR '00, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Yang, J. Carbonell, R. Brown, T. Pierce, B. Archibald, and X. Liu. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems, 14(4), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Connecting the dots between news articles

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
        July 2010
        1240 pages
        ISBN:9781450300551
        DOI:10.1145/1835804

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 July 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Author Tags

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader