Skip to main content

Advertisement

Log in

TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Frequent itemset mining over data streams becomes a hot topic in data mining and knowledge discovery in recent years, and has been applied to different areas. However, the setting of a minimum support threshold needs some domain knowledge. It will bring a lot of difficulties or much burden to users if the support threshold is not set reasonably. It is interesting for users to find top-K frequent itemsets over data streams. In this paper, a dynamical incremental approximate algorithm TOPSIL-Miner is presented to mine top-K significant itemsets in landmark windows. A new data structure, TOPSIL-Tree, is designed to store the potential significant itemsets and other data structures of maximum support list, ordered item list, TOPSET and minimum support list are devised to maintain information about mining results. Moreover, three optimal strategies are exploited to reduce time and space cost of the algorithm: (1) pruning trivial nodes in the current data stream, (2) promoting mining support threshold during mining process adaptively and heuristically, and (3) promoting pruning threshold dynamically. The accuracy of the algorithm is also analyzed. Extensive experiments are performed to evaluate the good effectiveness and the high efficiency and precision of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, SriKant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large databases, pp 487–499

  2. Babcock B, Babu S, Datar M et al (2002) Models and issues in data streams. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 1–16

  3. Babcock B, Olston C (2003) Distributed top-K monitoring. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, pp 28–39

  4. Chang JH, Lee WS (2004) A sliding window method for finding recently frequent itemsets over online data streams. J Inform Sci Eng 20(2): 753–762

    Google Scholar 

  5. Chang JH, Lee WS (2006) Finding recently frequent itemsets adaptively over online transactional data streams. Inform Syst 31(8): 849–869

    Article  Google Scholar 

  6. Cheung YL, Fu AWC (2004) Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng 16(9): 1052–1069

    Article  Google Scholar 

  7. Chi Y, Wang H, Yu PS, Muntz RR (2006) Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inform Syst 10(3): 265–294

    Article  Google Scholar 

  8. Dang XH, Ng WK, Ong KL (2008) Online mining of frequent sets in data streams with error guarantee. Knowl Inform Syst 16: 245–258

    Article  Google Scholar 

  9. Fu AWC, Kwong RWW, Tang J (2000) Mining N-most interesting itemsets. In: Proceedings of the international symposium on methodologies for intelligent systems, pp 59–67

  10. Gibbons PB, Matias Y (1999) Synopsis data structures for massive data sets. In: Proceedings of the 10th annual ACM-SIAM symposium on discrete algorithms, pp 909–910

  11. Golab L, Dehaan D, Demaine E (2003) Identifying frequent items in sliding windows over on-line packet streams. In: Proceedings of ACM SIGCOMM internet measurement conference, pp 173–178

  12. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD international conference on management of data, pp 1–12

  13. Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4): 1–58 (Article 11)

    Article  Google Scholar 

  14. Jia LF, Wang Z, Lu N et al (2007) RFIMiner: a regression-based algorithm for recently frequent patterns in multiple time granularity data streams. Appl Math Comput. 185(2): 769–783

    Article  MATH  Google Scholar 

  15. Jiang N, Gruenwald L (2006) CFI-Stream: mining closed frequent itemsets in data streams. In: Proceedings of the international conference on knowledge discovery and data mining, pp 592–597

  16. Leung CKS, Khan QI, Li Z et al (2007) CanTree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inform Syst. 11(3): 287–311

    Article  Google Scholar 

  17. Li HF, Shan MK, Lee SY (2008) DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inform Syst. 17(1): 79–97

    Article  Google Scholar 

  18. Lin C, Chiu D, Wu Y et al (2005) Mining frequent itemsets from data streams with a time-sensitive sliding window. In: Proceedings of the 5th international conference on data mining, pp 68–79

  19. Liu X, Xu H, Dong Y (2006) Mining frequent closed catterns from a sliding window over data streams. J Comput Res Dev 43(10): 1738–1743 (in Chinese)

    Article  Google Scholar 

  20. Metwally A, Agrawal D, Abbadi AE (2005) Efficient computation of frequent and top-k elements in data streams. In: Proceedings of the 10th international conference on databases theory, pp 398–412

  21. Tzvetkov P, Yan X, Han J (2005) Tsp: Mining top-k closed sequential patterns. Knowl Inform Syst. 7(4): 438–457

    Article  Google Scholar 

  22. Wang J, Han J, Lu Y et al (2005) TFP: an efficient algorithm for mining top-K frequent closed itemsets. IEEE Trans Knowl Data Eng. 17(5): 652–664

    Article  Google Scholar 

  23. Wong RCW, FU AWC (2006) Mining top-K frequent itemsets from data streams. Data Mining Knowl Discov. 13(2): 193–217

    Article  MathSciNet  Google Scholar 

  24. Zhu Y, Dennis S (2002) StatStream: statistical monitoring of thousands of data streams in real time. In: Proceedings of the international conference on very large data bases, pp 358–369

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bei Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, B., Huang, H. TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams. Knowl Inf Syst 23, 225–242 (2010). https://doi.org/10.1007/s10115-009-0211-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0211-5

Keywords