ABSTRACT
We present a new method for mining and ranking streams of news stories using cross-stream sequential patterns and content similarity. In particular, we focus on stories reporting the same event across the streams within a given time window, where an event is defined as a specific thing that happens at a specific time and place. For every discovered cluster of stories reporting the same event we create an itemset-sequence consisting of stream identifiers of the stories in the cluster, where the sequence is ordered according to the timestamps of the stories. Furthermore, we record exact timestamps and content similarities between the respective stories. Given such a collection of itemset-sequences we use it for two tasks: (I) to discover recurrent temporal publishing patterns between the news streams in terms of frequent sequential patterns and content similarity and (II) to rank the streams of news stories with respect to timeliness of reporting important events and content authority. We demonstrate the applicability of the presented method on a multi-stream of news stories was gathered from RSS feeds of major world news agencies.
- R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE, pages 3--14, 1995. Google ScholarDigital Library
- J. Allan. Topic Detection and Tracking: Event-Based Information Organization. Kluwer Academic Publishers, Norwell, MA, USA, 2002. Google ScholarDigital Library
- C. Clifton, R. Cooley, and J. Rennie. Topcat: Data mining for topic identification in a text corpus. TKDE, 16(8):949--964, 2004. Google ScholarDigital Library
- G. D. Corso, A. Gulli, and F. Romani. Ranking a stream of news. In In WWW 2005, pages 97--106. ACM Press, 2005. Google ScholarDigital Library
- R. Gwadera and F. Crestani. Discovering significant patterns in multi-stream sequences. In 2008 IEEE International Conference on Data Mining, pages 827--832, Pisa, Italy, December 2008. Google ScholarDigital Library
- J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, and Q. Chen. Mining sequential patterns by pattern-growth: The prefixspan approach. TKDE, 16(11), November 2004. Google ScholarDigital Library
Index Terms
- Mining and ranking streams of news stories using cross-stream sequential patterns
Recommendations
Mining news streams using cross-stream sequential patterns
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous InformationWe present a new method for mining streams of news stories using cross-stream sequential patterns. We cluster stories reporting the same event across the streams within a given time window. For every discovered cluster of stories we create an itemset-...
Mining Sequential Patterns in Data Stream
ISNN 2009: Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part IIWe present a new algorithm of mining sequential patterns in data stream. In recent years data stream emerges as a new data type in many applications. When processing data stream, the memory is fixed, new stream elements flow continuously. The stream ...
Sequential Patterns Mining Scaling with Data Stream Based on LSP-tree
FSKD '09: Proceedings of the 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 05We present a new method of mining sequential patterns in data stream based on a fast bitmap method. In recent years data stream emerges as a new data type in many applications. When processing data stream, the memory is fixed, new stream elements flow ...
Comments