ABSTRACT
Readers of news articles are typically faced with the problem of getting a good understanding of a complex story covered in an article. However, as news articles mainly focus on current or recent events, they often do not provide sufficient information about the history of an event or topic, leaving the user alone in discovering and exploring other news articles that might be related to a given article. This is a time consuming and non-trivial task, and the only help provided by some news outlets is some list of related articles or a few links within an article itself. What further complicates this task is that many of today's news stories cover a wide range of topics and events even within a single article, thus leaving the realm of traditional approaches that track a single topic or event over time.
In this paper, we present a framework to link news articles based on temporal expressions that occur in the articles, following the idea "if an article refers to something in the past, then there should be an article about that something". Our approach aims to recover the chronology of one or more events and topics covered in an article, leading to an information network of articles that can be explored in a thematic and particular chronological fashion. For this, we propose a measure for the relatedness of articles that is primarily based on temporal expressions in articles but also exploits other information such as persons mentioned and keywords. We provide a comprehensive evaluation that demonstrates the functionality of our framework using a multi-source corpus of recent German news articles.
- Allan, J. (2002). Topic detection and tracking: event-based information organization, volume 12. Springer. Google ScholarDigital Library
- Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008.Google ScholarCross Ref
- Brants, T., Chen, F., and Farahat, A. (2003). A system for new event detection. In SIGIR '03, pages 330--337. Google ScholarDigital Library
- Do, Q., Roth, D., Sammons, M., Tu, Y., and Vydiswaran, V. (2009). Robust, light-weight approaches to compute lexical similarity. Computer Science Research and Technical Reports, University of Illinois.Google Scholar
- Feng, A. and Allan, J. (2009). Incident threading for news passages. In CIKM'09, pages 1307--1316. Google ScholarDigital Library
- Gillenwater, J., Kulesza, A., and Taskar, B. (2012). Discovering diverse and salient threads in document collections. In EMNLP-CoNLL 2012, pages 710--720. Google ScholarDigital Library
- Kumaran, G. and Allan, J. (2005). Using names and topics for new event detection. In HLT/EMNLP 2005, pages 121--128. Google ScholarDigital Library
- Lavrenko, V., Allan, J., DeGuzman, E., LaFlamme, D., Pollard, V., and Thomas, S. (2002). Relevance models for topic detection and tracking. In Human Language Technology Research, pages 115--121. Google ScholarDigital Library
- Nallapati, R., Feng, A., Peng, F., and Allan, J. (2004). Event threading within news topics. In CIKM 2004, pages 446--453. Google ScholarDigital Library
- Newman, M. (2010). Networks - An Introduction. Oxford University Press. Google ScholarCross Ref
- Nomoto, T. (2010). Two-tier similarity model for story link detection. In CIKM'10, pages 789--798. Google ScholarDigital Library
- Pouliquen, B., Steinberger, R., and Deguernel, O. (2008). Story tracking: linking similar news over time and across languages. In MMIES '08, pages 49--56. Google ScholarDigital Library
- Schmid, H. (1995). Improvements in part-of-speech tagging with an application to German. In ACL SIGDAT-Workshop.Google Scholar
- Shahaf, D. and Guestrin, C. (2012). Connecting two (or less) dots: Discovering structure in news articles. TKDD, 5(4):24. Google ScholarDigital Library
- Shahaf, D., Yang, J., Suen, C., Jacobs, J., Wang, H., and Leskovec, J. (2013). Information cartography: Creating zoomable, large-scale maps of information. In Knowledge Discovery and Data Mining, KDD '13, pages 1097--1105. Google ScholarDigital Library
- Strötgen, J. and Gertz, M. (2013). Multilingual and cross-domain temporal tagging. Language Resources and Evaluation, 47(2):269--298.Google ScholarCross Ref
- Vaca, C. K., Mantrach, A., Jaimes, A., and Saerens, M. (2014). A time-based collective factorization for topic discovery and monitoring in news. In WWW '14, pages 527--538. Google ScholarDigital Library
- Wang, L. and Li, F. (2011). Story link detection based on event words. In Computational Linguistics and Intelligent Text Processing, pages 202--211. Springer. Google ScholarDigital Library
- Wasserman, S. (1994). Social network analysis: Methods and applications, volume 8. Cambridge University Press.Google ScholarCross Ref
- Yan, R., Wan, X., Otterbacher, J., Kong, L., Li, X., and Zhang, Y. (2011). Evolutionary timeline summarization: A balanced optimization framework via iterative substitution. In SIGIR '11, pages 745--754. Google ScholarDigital Library
- Zhu, X. and Oates, T. (2012). Finding story chains in newswire articles. In Information Reuse and Integration, pages 93--100.Google ScholarCross Ref
- Zhu, X. and Oates, T. (2013). Finding news story chains based on multi-dimensional event profiles. In Open Research Areas in Information Retrieval, pages 157--164. Google ScholarDigital Library
Index Terms
- Time will Tell: Temporal Linking of News Stories
Recommendations
Show and tell more: topic-oriented multi-sentence image captioning
IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial IntelligenceImage captioning aims to generate textual descriptions for images. Most previous work generates a single-sentence description for each image. However, a picture is worth a thousand words. Single-sentence can hardly give a complete view of an image even ...
Comparing Semantic Models for Evaluating Automatic Document Summarization
TSD 2015: Proceedings of the 18th International Conference on Text, Speech, and Dialogue - Volume 9302The main focus of this paper is the examination of semantic modelling in the context of automatic document summarization and its evaluation. The main area of our research is extractive summarization, more specifically, contrastive opinion summarization. ...
Name Disambiguation Boosted by Latent Topics from Web Directories
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01Search results for personal name queries often contain documents relevant to several people as a personal name is often shared by several people. In order to differentiate people in these search results, it is required to extract contexts relevant to ...
Comments