Skip to main content
Log in

Mining top-k frequent patterns with combination reducing techniques

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Top-k frequent pattern mining finds interesting patterns from the highest support to the k-th support. The approach can be effectively applied in numerous fields such as marketing, finance, bio-data analysis, and so on since it does not need constraints by a minimum support threshold. Top-k mining methods use the support of the k-th pattern, not a user-specified minimum support. Thus, the methods conduct mining operations based on very low supports until the k-th pattern is detected. When a low support is used in the mining process, single-paths with numerous items are generated, where the top-k mining algorithm extracts valid patterns by combining the items for each single-path. Therefore, the bigger the number of combinations is, the larger the increase in time and memory consumption is. In this paper, in order to mine top-k frequent patterns more efficiently, we consider converting patterns obtained from single-paths into composite patterns during the mining process and recovering them as the original patterns when the top-k frequent patterns are extracted. For this, we define a new concept, the composite pattern, and propose novel techniques for reducing pattern combinations in the single-path. Two algorithms are introduced in this paper, where the former is CRM (Combination Reducing method), applying our reduction manner, and the latter is CRMN (Combination Reducing method for N-itemset), considering N-itemset, i.e., patterns’ lengths. A performance evaluation shows that CRM and CRMN algorithms can efficiently reduce pattern combinations in single-paths compared to state-of-the-art algorithms. The experimental results also illustrate that our approaches have outstanding performance in terms of runtime, memory, and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39

Similar content being viewed by others

References

  1. Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: Knowledge discovery and data mining (KDD), Jun 2009, pp 29–38

    Google Scholar 

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc 20th int’l conf very large databases (VLDB), pp 487–499

    Google Scholar 

  3. Amphawan K, Lenca P, Surarerks A (2012) Mining top-k regular-frequent itemset using database partitioning and support estimation. Expert Syst Appl 39(2):1924–1936

    Article  Google Scholar 

  4. Chang L, Wang T, Yang D, Luan H (2008) SeqStream: mining closed sequential patterns over stream sliding windows. In: International conference on data mining (ICDM), Dec 2008, pp 83–92

    Google Scholar 

  5. Chang L, Wang T, Yang D, Luan H, Tang S (2009) Efficient algorithms for incremental maintenance of closed sequential patterns in large databases. Data Knowl Eng 68:68–106

    Article  Google Scholar 

  6. Cheung YL, Fu AW (2004) Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng 16(6):1052–1069

    Article  Google Scholar 

  7. Chuang KT, Huang JL, Chen MS (2008) Mining top-k frequent patterns in the presence of the memory constraint. VLDB J 17(5):1321–1344

    Article  Google Scholar 

  8. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent pattern tree approach. Data Min Knowl Discov 8(1):53–87

    Article  MathSciNet  Google Scholar 

  9. Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86

    Article  MathSciNet  Google Scholar 

  10. Hewett R (2011) Mining software defect data to support software testing management. Appl Intell 34(2):245–257

    Article  Google Scholar 

  11. Jiménez A, Galiano FB, Talavera JC (2012) Mining frequent patterns from XML data: efficient algorithms and design trade-offs. Expert Syst Appl 39(1):1134–1140

    Article  Google Scholar 

  12. Lam HT, Calders T (2010) Mining top-k frequent items in a data stream with flexible sliding windows. In: International conference on knowledge discovery and data mining (KDD), Jul 2010, pp 283–292

    Google Scholar 

  13. Lee G, Yun U, Ryu KH (2014) Sliding window based weighted maximal frequent pattern mining over data streams. Expert Syst Appl 41(2):694–708

    Article  Google Scholar 

  14. Li CW, Jea KF, Lin RP, Yen SF, Hsu CW (2012) Mining frequent patterns from dynamic data streams with data load management. J Syst Softw 85(6):1346–1362

    Article  Google Scholar 

  15. Li G, Feng J, Wang J, Zhang Y, Zhou L (2006) Incremental mining of frequent query patterns from XML queries for caching. In: International conference on data mining (ICDM), Dec 2006, pp 350–361

    Chapter  Google Scholar 

  16. Li H (2008) A sliding window method for finding top-k path traversal patterns over streaming web click-sequences. Expert Syst Appl 36(3):4382–4386

    Article  Google Scholar 

  17. Li H (2009) Interactive mining of top-k frequent closed itemsets from data streams. Expert Syst Appl 36(7):10779–10788

    Article  Google Scholar 

  18. Li X, Han J (2007) Mining approximate top-k subspace anomalies in multi-dimensional time-series data. In: Very large data bases (VLDB), Sep 2007, pp 447–458

    Google Scholar 

  19. Lin KW, Hsieh M, Tseng VS (2010) A novel prediction-based strategy for object tracking in sensor networks by mining seamless temporal movement patterns. Expert Syst Appl 37:2799–2807

    Article  Google Scholar 

  20. Liu YH (2012) Mining frequent patterns from univariate uncertain data. Data Knowl Eng 71(1):47–68

    Article  Google Scholar 

  21. Liu YH (2013) Stream mining on univariate uncertain data. Appl Intell 39(2):315–344

    Article  Google Scholar 

  22. Lucchesea C, Orlando S, Perego R (2010) Mining top-k patterns from binary datasets in presence of noise. In: Proceedings of the SIAM international conference on data mining (SDM), April 2010, pp 165–176

    Google Scholar 

  23. Márquez-Vera C, Cano A, Romero C, Ventura S (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl Intell 38(3):315–330

    Article  Google Scholar 

  24. Muzammal M, Raman R (2011) Mining sequential patterns from probabilistic databases. In: Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD), May 2011, pp 210–221

    Chapter  Google Scholar 

  25. Pei J, Han J, Mao R (2000) Closet: an efficient algorithm for mining frequent closed itemsets. In: Proc ACM SIGMOD workshop research issues in data mining and knowledge discovery, May 2000, pp 21–30

    Google Scholar 

  26. Priya RV, Vadivel A, Thakur RS (2012) Maximal pattern mining using fast CP-tree for knowledge discovery. Int J Inf Syst Soc Change 3(1):56–74

    Article  Google Scholar 

  27. Pyun G, Yun U, Ryu K (2014) Efficient frequent pattern mining based on linear prefix tree. Knowl-Based Syst 55(1):125–129

    Article  Google Scholar 

  28. Sallaberry A, Pecheur N, Bringay S, Roche M, Teisseire M (2011) Sequential patterns mining and gene sequence visualization to discover novelty from microarray data. J Biomed Inform 44(5):760–774

    Article  Google Scholar 

  29. Shie BE, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435

    Article  Google Scholar 

  30. Tanbeer SK, Ahmed CF, Jeong BS, Lee YK (2009) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179(5):559–583

    Article  MATH  MathSciNet  Google Scholar 

  31. Tanbeer SK, Ahmed CF, Jeong BS, Lee YK (2009) Sliding window-based frequent pattern mining over data streams. Inf Sci 179(22):3843–3865

    Article  MathSciNet  Google Scholar 

  32. Tsai PS (2010) Mining top-k frequent closed itemsets over data streams using the sliding window model. Expert Syst Appl 37(10):6968–6973

    Article  Google Scholar 

  33. Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-growth: an efficient algorithm for high utility itemset mining. In: Knowledge discovery and data mining (KDD), July 2010, pp 253–262

    Google Scholar 

  34. Wang J, Han J, Lu Y, Tzvetkov P (2005) TFP: an efficient algorithm for mining top-k frequent closed itemsets. Data Knowl Eng 17(5):652–664

    Article  Google Scholar 

  35. Wang YT, Cheng JT (2011) Mining periodic movement patterns of mobile phone users based on an efficient sampling approach. Appl Intell 35(1):32–40

    Article  Google Scholar 

  36. Wong RC, Fu AW (2006) Mining top-k frequent itemsets from data streams. Data Min Knowl Discov 13(2):193–217

    Article  MathSciNet  Google Scholar 

  37. Xiong H, Brodie M, Ma TOP-COP S (2006) Mining TOP-k strongly correlated pairs in large databases. In: International conference on data mining (ICDM), Dec 2006, pp 1162–1166

    Chapter  Google Scholar 

  38. Yen SJ, Lee YS (2013) Mining non-redundant time-gap sequential patterns. Appl Intell 39(4):727–738

    Article  MathSciNet  Google Scholar 

  39. Yoo JS, Bow M (2011) Mining top-k closed co-location patterns. In: IEEE international conference on spatial data mining and geographical knowledge services (ICSDM), June 2011, pp 100–105

    Google Scholar 

  40. Yun U, Ryu KH (2010) Discovering important sequential patterns with length-decreasing weighted support constraints. Int J Inf Technol Decis Mak 9(4):575–599

    Article  MATH  Google Scholar 

  41. Yun U, Ryu K (2011) Approximate weight frequent pattern mining with/without noisy environments. Knowl-Based Syst 24(1):73–82

    Article  Google Scholar 

  42. Yun U, Shin H, Ryu KH, Yoon E (2012) An efficient mining algorithm for maximal weighted frequent patterns in transactional databases. Knowl-Based Syst 33:53–64

    Article  Google Scholar 

  43. Yun U, Ryu K (2013) Efficient mining of maximal correlated weight frequent patterns. Intell Data Anal 17(5):917–939

    Google Scholar 

  44. Yun U, Lee G, Ryu K (2014) Mining maximal frequent patterns by considering weight conditions over data streams. Knowl-Based Syst 55(1):49–65

    Article  Google Scholar 

  45. Vo B, Coenen F, Le B (2013) A new method for mining frequent weighted itemsets based on WIT-trees. Expert Syst Appl 40(4):1256–1264

    Article  Google Scholar 

  46. Zhang X, Zhang Y (2011) Sliding-window top-k pattern mining on uncertain streams. J Comput Inf Syst 7(3):984–992

    Google Scholar 

  47. Zou J, Xiao J, Hou R, Wang Y (2010) Frequent instruction sequential pattern mining in hardware sample data. In: International conference on data mining (ICDM), Dec 2010, pp 1205–1210

    Google Scholar 

Download references

Acknowledgements

This research was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF No. 2013005682 and 20080062611).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Unil Yun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pyun, G., Yun, U. Mining top-k frequent patterns with combination reducing techniques. Appl Intell 41, 76–98 (2014). https://doi.org/10.1007/s10489-013-0506-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-013-0506-9

Keywords

Navigation