TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

Yang, Bei; Huang, Houkuan

doi:10.1007/s10115-009-0211-5

TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

Regular Paper
Published: 19 May 2009

Volume 23, pages 225–242, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Bei Yang^1,2 &
Houkuan Huang²

218 Accesses
16 Citations
Explore all metrics

Abstract

Frequent itemset mining over data streams becomes a hot topic in data mining and knowledge discovery in recent years, and has been applied to different areas. However, the setting of a minimum support threshold needs some domain knowledge. It will bring a lot of difficulties or much burden to users if the support threshold is not set reasonably. It is interesting for users to find top-K frequent itemsets over data streams. In this paper, a dynamical incremental approximate algorithm TOPSIL-Miner is presented to mine top-K significant itemsets in landmark windows. A new data structure, TOPSIL-Tree, is designed to store the potential significant itemsets and other data structures of maximum support list, ordered item list, TOPSET and minimum support list are devised to maintain information about mining results. Moreover, three optimal strategies are exploited to reduce time and space cost of the algorithm: (1) pruning trivial nodes in the current data stream, (2) promoting mining support threshold during mining process adaptively and heuristically, and (3) promoting pruning threshold dynamically. The accuracy of the algorithm is also analyzed. Extensive experiments are performed to evaluate the good effectiveness and the high efficiency and precision of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal R, SriKant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large databases, pp 487–499
Babcock B, Babu S, Datar M et al (2002) Models and issues in data streams. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 1–16
Babcock B, Olston C (2003) Distributed top-K monitoring. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, pp 28–39
Chang JH, Lee WS (2004) A sliding window method for finding recently frequent itemsets over online data streams. J Inform Sci Eng 20(2): 753–762
Google Scholar
Chang JH, Lee WS (2006) Finding recently frequent itemsets adaptively over online transactional data streams. Inform Syst 31(8): 849–869
Article Google Scholar
Cheung YL, Fu AWC (2004) Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng 16(9): 1052–1069
Article Google Scholar
Chi Y, Wang H, Yu PS, Muntz RR (2006) Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inform Syst 10(3): 265–294
Article Google Scholar
Dang XH, Ng WK, Ong KL (2008) Online mining of frequent sets in data streams with error guarantee. Knowl Inform Syst 16: 245–258
Article Google Scholar
Fu AWC, Kwong RWW, Tang J (2000) Mining N-most interesting itemsets. In: Proceedings of the international symposium on methodologies for intelligent systems, pp 59–67
Gibbons PB, Matias Y (1999) Synopsis data structures for massive data sets. In: Proceedings of the 10th annual ACM-SIAM symposium on discrete algorithms, pp 909–910
Golab L, Dehaan D, Demaine E (2003) Identifying frequent items in sliding windows over on-line packet streams. In: Proceedings of ACM SIGCOMM internet measurement conference, pp 173–178
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD international conference on management of data, pp 1–12
Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4): 1–58 (Article 11)
Article Google Scholar
Jia LF, Wang Z, Lu N et al (2007) RFIMiner: a regression-based algorithm for recently frequent patterns in multiple time granularity data streams. Appl Math Comput. 185(2): 769–783
Article MATH Google Scholar
Jiang N, Gruenwald L (2006) CFI-Stream: mining closed frequent itemsets in data streams. In: Proceedings of the international conference on knowledge discovery and data mining, pp 592–597
Leung CKS, Khan QI, Li Z et al (2007) CanTree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inform Syst. 11(3): 287–311
Article Google Scholar
Li HF, Shan MK, Lee SY (2008) DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inform Syst. 17(1): 79–97
Article Google Scholar
Lin C, Chiu D, Wu Y et al (2005) Mining frequent itemsets from data streams with a time-sensitive sliding window. In: Proceedings of the 5th international conference on data mining, pp 68–79
Liu X, Xu H, Dong Y (2006) Mining frequent closed catterns from a sliding window over data streams. J Comput Res Dev 43(10): 1738–1743 (in Chinese)
Article Google Scholar
Metwally A, Agrawal D, Abbadi AE (2005) Efficient computation of frequent and top-k elements in data streams. In: Proceedings of the 10th international conference on databases theory, pp 398–412
Tzvetkov P, Yan X, Han J (2005) Tsp: Mining top-k closed sequential patterns. Knowl Inform Syst. 7(4): 438–457
Article Google Scholar
Wang J, Han J, Lu Y et al (2005) TFP: an efficient algorithm for mining top-K frequent closed itemsets. IEEE Trans Knowl Data Eng. 17(5): 652–664
Article Google Scholar
Wong RCW, FU AWC (2006) Mining top-K frequent itemsets from data streams. Data Mining Knowl Discov. 13(2): 193–217
Article MathSciNet Google Scholar
Zhu Y, Dennis S (2002) StatStream: statistical monitoring of thousands of data streams in real time. In: Proceedings of the international conference on very large data bases, pp 358–369

Download references

Author information

Authors and Affiliations

School of Information Engineering, Zhengzhou University, Zhengzhou, 450001, China
Bei Yang
School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
Bei Yang & Houkuan Huang

Authors

Bei Yang
View author publications
You can also search for this author inPubMed Google Scholar
Houkuan Huang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Bei Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, B., Huang, H. TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams. Knowl Inf Syst 23, 225–242 (2010). https://doi.org/10.1007/s10115-009-0211-5

Download citation

Received: 23 September 2008
Revised: 18 February 2009
Accepted: 21 March 2009
Published: 19 May 2009
Issue Date: May 2010
DOI: https://doi.org/10.1007/s10115-009-0211-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FPS-Tree Algorithm to Find Top-k Closed Itemsets in Data Streams

Mining top-k high-utility itemsets from a data stream under sliding window model

FCHM-stream: fast closed high utility itemsets mining over data streams

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FPS-Tree Algorithm to Find Top-k Closed Itemsets in Data Streams

Mining top-k high-utility itemsets from a data stream under sliding window model

FCHM-stream: fast closed high utility itemsets mining over data streams

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now