Skip to main content
Log in

Finding story chains in newswire articles using random walks

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Massive amounts of information about news events are published on the Internet every day in online newspapers, blogs, and social network messages. While search engines like Google help retrieve information using keywords, the large volumes of unstructured search results returned by search engines make it hard to track the evolution of an event. A story chain is composed of a set of news articles that reveal hidden relationships among different events. Traditional keyword-based search engines provide limited support for finding story chains. In this paper, we propose a random walk based algorithm to find story chains. When breaking news happens, many media outlets report the same event. We have two pruning mechanisms in the algorithm to automatically exclude redundant articles from the story chain and to ensure efficiency of the algorithm. We further explore how named entities and word relevance can help find relevant news articles and improve algorithm efficiency by creating a co-clustering based correlation graph. Experimental results show that our proposed algorithm can generate coherent story chains without redundancy. The efficiency of the algorithm is significantly improved on the correlation graph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://www.mturk.com/mturk/welcome

  2. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T21

  3. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19

References

  • Ahmed, S.T., Bhindwale, R., Davulcu, H. (2009). Tracking terrorism news threads by extracting eventsignatures. In Proceedings of the 2009 IEEE international conference on intelligence and security informatics, ISI’09 (pp. 182–184). Piscataway: IEEE Press.

    Chapter  Google Scholar 

  • Angelova, R., & Weikum, G. (2006). Graph-based text classification: learn from your neighbors. In Proceedings of the 29th annual international ACM SIGIR conference (pp. 485–492). New York: ACM.

    Google Scholar 

  • Chen, H., & Dumais, S. (2000). Bringing order to the web: automatically categorizing search results. In Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’00 (pp. 145–152). New York: ACM.

    Chapter  Google Scholar 

  • Chieu, H.L., & Lee, Y.K. (2004). Query based event extraction along a timeline. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 425–432). New York: ACM.

    Google Scholar 

  • Dhillon, I.S., Mallela, S., Modha, D.S. (2003). Information-theoretic co-clustering. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03 (pp. 89–98). New York: ACM.

    Chapter  Google Scholar 

  • Fung, G.P.C., Yu, J.X., Liu, H., Yu, P.S. (2007). Time-dependent event hierarchy construction. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’07 (pp. 300–309). New York: ACM.

    Chapter  Google Scholar 

  • Haveliwala, T.H. (2002). Topic-sensitive pagerank. In Proceedings of the 11th international conference on world wide web, WWW ’02 (pp. 517–526). New York: ACM.

    Google Scholar 

  • He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P., Giles, L. (2009). Detecting topic evolution in scientific literature: How can citations help? In Proceeding of the 18th ACM conference on Information and knowledge management, CIKM ’09 (pp. 957–966). New York: ACM.

    Chapter  Google Scholar 

  • IEEE 13th International Conference on Information Reuse & Integration, IRI 2012, Las Vegas, NV, USA, August 8–10, 2012. IEEE, 2012.

  • Jo, Y., Hopcroft, J.E., Lagoze, C. (2011). The web of topics: discovering the topology of topic evolution in a corpus. In Proceedings of the 20th international conference on world wide web, WWW ’11 (pp. 257–266). New York: ACM.

    Google Scholar 

  • Kumaran, G., & Allan, J. (2004). Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information, retrieval, SIGIR ’04 (pp. 297–304). New York: ACM.

    Chapter  Google Scholar 

  • Lin, F.-r., & Liang, C.-H. (2008). Storyline-based summarization for news topic retrospection. Decision Support Systems, 45, 473–490.

    Article  Google Scholar 

  • Makkonen, J., Ahonen-Myka, H., Salmenkivi, M. (2002). Applying semantic classes in event detection and tracking. In Proceedings of international conference on natural language process (pp. 175–183). Mumbai: Springer.

    Google Scholar 

  • Makkonen, J., Ahonen-Myka, H., Salmenkivi, M. (2004). Simple semantics in topic detection and tracking. Information Retrieval, 7, 347–368.

    Article  Google Scholar 

  • Mei, Q., & Zhai, C. (2005). Discovering evolutionary theme patterns from text: An exploration of temporal text mining. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05 (pp. 198–207). New York: ACM.

    Chapter  Google Scholar 

  • Morinaga, S., & Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’04 (pp. 811–816). New York: ACM.

    Google Scholar 

  • Nallapati, R., Feng, A., Peng, F., Allan, J. (2004). Event threading within news topics. In Proceedings of the thirteenth ACM international conference on information and knowledge management, CIKM ’04 (pp. 446–453). New York: ACM.

    Chapter  Google Scholar 

  • Perkio, J., Buntine, W., Perttu, S. (2004). Exploring independent trends in a topic-based search engine. In Proceedings of the 2004 IEEE/WIC/ACM international conference on web intelligence, WI ’04 (pp. 664–668). Washington: IEEE Computer Society.

    Google Scholar 

  • Shahaf, D., & Guestrin, C. (2010). Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10 (pp. 623–632). New York: ACM.

    Chapter  Google Scholar 

  • Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R. (2006). Monic: modeling and monitoring cluster transitions. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06 (pp. 706–711). New York: ACM.

    Chapter  Google Scholar 

  • Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In ICDM (pp. 418–425).

  • Xiang, L., Yuan, Q., Zhao, S., Chen, L., Zhang, X., Yang, Q., Sun, J. (2010). Temporal recommendation on graphs via long- and short-term preference fusion. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10 (pp. 723–732). New York: ACM.

    Chapter  Google Scholar 

  • Yan, R., Wan, X., Otterbacher, J., Kong, L., Li, X., Zhang, Y. (2011). Evolutionary timeline summarization: A balanced optimization framework via iterative substitution. In Proceedings of the 34th international ACM SIGIR conference (pp. 745–754). New York: ACM.

    Google Scholar 

  • Zhu, X., Oates, T. (2012). Finding story chains in newswire articles. In IEEE 13th International conference on information reuse & integration, IRI 2012, Las Vegas, NV, USA, August 8–10. IEEE.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianshu Zhu.

Additional information

This paper is an extended version of our previous paper that was published in the 13th IEEE International Conference on Information Reuse and Integration, 2012.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, X., Oates, T. Finding story chains in newswire articles using random walks. Inf Syst Front 16, 753–769 (2014). https://doi.org/10.1007/s10796-013-9420-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-013-9420-2

Keywords

Navigation