ABSTRACT
The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today's society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture.
In this paper, we investigate methods for automatically connecting the dots -- providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the ongoing health-care debate.
We formalize the characteristics of a good chain and provide an efficient algorithm (with theoretical guarantees) to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate the algorithm's effectiveness in helping users understanding the news.
Supplemental Material
- Copernic, http://www.copernic.com.Google Scholar
- Google news timeline, http://newstimeline.googlelabs.com/.Google Scholar
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Computer Networks and ISDN Systems, 1998. Google ScholarDigital Library
- R. Choudhary, S. Mehta, A. Bagchi, and R. Balakrishnan. Towards characterization of actor evolution and interactions in news corpora. In Advances in Information Retrieval. Google ScholarDigital Library
- K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In KDD '09, 2009. Google ScholarDigital Library
- E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie: providing personalized newsfeeds via analysis of information novelty. In WWW '04, 2004. Google ScholarDigital Library
- D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of in uence through a social network. In KDD '03. Google ScholarDigital Library
- J. Kleinberg. Authoritative sources in a hyperlinked environment, 1999.Google Scholar
- J. Kleinberg. Bursty and hierarchical structure in streams, 2002.Google Scholar
- D. D. Lewis and K. A. Knowles. Threading electronic mail: A preliminary study. Information Processing and Management, 33, 1997. Google ScholarDigital Library
- B. Masand, G. Linoff, and D. Waltz. Classifying news stories using memory based reasoning. In SIGIR '92, 1992. Google ScholarDigital Library
- Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In KDD '05, 2005. Google ScholarDigital Library
- R. Nallapati, A. Feng, F. Peng, and J. Allan. Event threading within news topics. In CIKM '04, 2004. Google ScholarDigital Library
- J. Niehaus and R. M. Young. A computational model of inferencing in narrative. In AAAI Spring Symposium '09, 2009.Google Scholar
- J. P. Rowe, S. W. McQuiggan, J. L. Robison, D. R. Marcey, and J. C. Lester. Storyeval: An empirical evaluation framework for narrative generation. In AAAI Spring Symposium '09, 2009.Google Scholar
- S. R. Turner. The creative process: A computer model of storytelling and creativity, 1994.Google Scholar
- C. Yang, X. Shi, and C. Wei. Tracing the event evolution of terror attacks from on-line news. In Intelligence and Security Informatics. Google ScholarDigital Library
- Y. Yang, T. Ault, T. Pierce, and C. Lattimer. Improving text categorization methods for event tracking. In SIGIR '00, 2000. Google ScholarDigital Library
- Y. Yang, J. Carbonell, R. Brown, T. Pierce, B. Archibald, and X. Liu. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems, 14(4), 1999. Google ScholarDigital Library
Index Terms
- Connecting the dots between news articles
Recommendations
Connecting Two (or Less) Dots: Discovering Structure in News Articles
Finding information is becoming a major part of our daily life. Entire sectors, from Web users to scientists and intelligence analysts, are increasingly struggling to keep up with the larger and larger amounts of content published every day. With this ...
Connecting the dots between news articles
IJCAI'11: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume ThreeThe process of extracting useful knowledge from large datasets has become one of the most pressing problems in today's society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly ...
News analysis: Worldcom and Vivendi reeling from communications crash
Within a few days, turmoil engulfed two giants of telecommunications and media: Paris-based Vivendi Universal SA and Clinton, Miss-based WorldCom Inc. Each, in its own way, fell victim to the failure of the communications sector to keep growing at the ...
Comments