Skip to main content
Log in

Online detection of bursty events and their evolution in news streams

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

Online monitoring of temporally-sequenced news streams for interesting patterns and trends has gained popularity in the last decade. In this paper, we study a particular news stream monitoring task: timely detection of bursty events which have happened recently and discovery of their evolutionary patterns along the timeline. Here, a news stream is represented as feature streams of tens of thousands of features (i.e., keyword. Each news story consists of a set of keywords.). A bursty event therefore is composed of a group of bursty features, which show bursty rises in frequency as the related event emerges. In this paper, we give a formal definition to the above problem and present a solution with the following steps: (1) applying an online multi-resolution burst detection method to identify bursty features with different bursty durations within a recent time period; (2) clustering bursty features to form bursty events and associating each event with a power value which reflects its bursty level; (3) applying an information retrieval method based on cosine similarity to discover the event’s evolution (i.e., highly related bursty events in history) along the timeline. We extensively evaluate the proposed methods on the Reuters Corpus Volume 1. Experimental results show that our methods can detect bursty events in a timely way and effectively discover their evolution. The power values used in our model not only measure event’s bursty level or relative importance well at a certain time point but also show relative strengths of events along the same evolution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allan, J., Papka, R., Lavrenko, V., 1998. Online New Event Detection and Tracking. Proc. SIGIR Conf. on Research and Development in Information Retrieval, p.37–45. [doi:10.1145/290941.290954]

  • Baeza-Yates, R., Ribeiro-Neto, B., 2004. Modern Information Retrieval. China Machine Press, Beijing, China (in Chinese).

    Google Scholar 

  • Bulut, A., Singh, A.K., 2005. A Unified Framework for Monitoring Data Streams in Real Time. Proc. 21st Int. Conf. on Data Engineering, p.44–55. [doi:10.1109/ICDE.2005.13]

  • Chen, W., Zhang, L.J., Wang, C., Chen, C., Bu, J.J., 2008. Pervasive Web News Recommendation for Visually-Impaired People. IEEE/WIC/ACM Int. Conf. on Web Intelligence and Intelligent Agent Technology, 3:119–122. [doi:10.1109/WIIAT.2008.43]

    Article  Google Scholar 

  • Chu, K.K.W., Wong, M.H., 1999. Fast Time-Series Searching with Scaling and Shifting. Proc. 8th ACM SIGMOD Symp. on Principles of Database Systems, p.237–248. [doi:10.1145/303976.304000]

  • Croft, W.B., Metzler, D., Strohman, T., 2009. Search Engines: Information Retrieval in Practice. Addison Wesley, Boston.

    Google Scholar 

  • Dezso, Z., Almass, E., Lukacs, A., Racz, B., Szakadat, I., Barabasi, A.L., 2006. Dynamic of information access on the Web. Phys. Rev. E, 73(6):066132. [doi:10.1103/PhysRevE.73.066132]

    Article  Google Scholar 

  • Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972–976. [doi:10.1126/science.1136800]

    Article  MathSciNet  Google Scholar 

  • Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.J., 2005. Parameter Free Bursty Events Detection in Text Streams. Proc. 31st Int. Conf. on Very Large Data Bases, p.181–192.

  • He, Q., Chang, K., Lim, E., 2007. Analyzing Feature Trajectories for Event Detection. Proc. 30th Annual Int. ACM SIGIR Conf., p.207–214. [doi:10.1145/1277741.1277779]

  • Kahveci, T., Singh, A., 2001. Variable Length Queries for Time Series Data. Proc. 17th Int. Conf. on Data Engineering, p.273–282. [doi:10.1109/ICDE.2001.914838]

  • Kleinberg, J., 2002. Bursty and Hierarchical Structure in Streams. Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.91–101. [doi:10.1023/A:1024940629314]

  • Kumaran, G., Allan, J., 2004. Text Classification and Named Entities for New Event Detection. Proc. 27th Annual Int. ACM SIGIR Conf., p.297–304. [doi:10.1145/1008992.1009044]

  • Lam, W., Meng, H., Wong, K., Yen, J., 2001. Using contextual analysis for news event detection. Int. J. Intell. Syst., 16(4):525–546. [doi:10.1002/int.1022]

    Article  MATH  Google Scholar 

  • Lewis, D.D., Yang, Y.M., Rose, T.G., Li, F., 2004. RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res., 5:361–397.

    Google Scholar 

  • Li, Z.W., Wang, B., Li, M.J., Ma, W.Y., 2005. A Probabilistic Model for Retrospective News Event Detection. Proc. SIGIR Conf. on Research and Development in Information Retrieval, p.106–113. [doi:10.1145/1076034.1076055]

  • Luxburg, U., 2007. A tutorial on spectral clustering. Statist. & Comput., 17(4):395–416. [doi:10.1007/s11222-007-9033-z]

    Article  Google Scholar 

  • Mei, Q.Z., Zhai, C.X., 2005. Discovering Evolutionary Theme Patterns from Text: An Exploration of Temporal Text Mining. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery in Data Mining, p.198–207. [doi:10.1145/1081870.1081895]

  • Topic Detection and Tracking Evaluation (TDT) Project, 2007. Available from http://www.itl.nist.gov/iad/mig//tests/tdt/ [Accessed on Aug. 8, 2009].

  • Vlachos, M., Meek, C., Vagena, Z., Gunopulos, D., 2004. Identifying Similarities, Periodicities and Bursts for Search Queries. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.131–142. [doi:10.1145/1007568.1007586]

  • Xia, D.Y., Wu, F., Zhang, X.Q., Zhuang, Y.T., 2008. Local and global approaches of affinity propagation clustering for large scale data. J. Zhejiang Univ.-Sci. A, 9(10):1373–1381. [doi:10.1631/jzus.A0720058]

    Article  MATH  Google Scholar 

  • Yang, Y.M., Pierce, T., Carbonell, J.G., 1998. A Study on Retrospective and On-line Event Detection. Proc. SIGIR Conf. on Research and Development in Information Retrieval, p.28–36. [doi:10.1145/290941.290953]

  • Yang, Y.M., Zhang, J., Carbonell, J., Jin, C., 2001. Topic-Conditioned Novelty Detection. Proc. 8th ACM SIGKDD Int. Conf., p.688–693. [doi:10.1145/775047.775150]

  • Yuan, Z.J., Yan, J., Yang, S.Q., 2007. Online Burst Detection Over High Speed Short Text Streams. Proc. 7th Int. Conf. on Computational Science, p.717–725. [doi:10.1007/978-3-540-72588-6_119]

  • Zhang, K., Li, J.Z., Wu, G., 2007. New Event Detection Based on Indexing-Tree and Name Entity. Proc. 30th Annual Int. ACM SIGIR Conf., p.215–222. [doi:10.1145/1277741.1277780]

  • Zhang, K., Li, J.Z., Wu, G., Wang, K.H., 2008. A new event detection model based on term reweighting. J. Softw., 19(4):817–828 (in Chinese). [doi:10.3724/SP.J.1001.2008.00817]

    Article  Google Scholar 

  • Zhu, Y., Shasha, D., 2002. Statstream: Statistical Monitoring of Thousands of Data Streams in Real Time. Proc. 28th Int. Conf. on Very Large Databases, p.358–369. [doi:10.1016/B978-155860869-6/50039-1]

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Can Wang.

Additional information

Project (No. 2008BAH26B00) supported by the National Key Technology R & D Program of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, W., Chen, C., Zhang, Lj. et al. Online detection of bursty events and their evolution in news streams. J. Zhejiang Univ. - Sci. C 11, 340–355 (2010). https://doi.org/10.1631/jzus.C0910245

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C0910245

Key words

CLC number

Navigation