Abstract
Top-k frequent pattern mining finds interesting patterns from the highest support to the k-th support. The approach can be effectively applied in numerous fields such as marketing, finance, bio-data analysis, and so on since it does not need constraints by a minimum support threshold. Top-k mining methods use the support of the k-th pattern, not a user-specified minimum support. Thus, the methods conduct mining operations based on very low supports until the k-th pattern is detected. When a low support is used in the mining process, single-paths with numerous items are generated, where the top-k mining algorithm extracts valid patterns by combining the items for each single-path. Therefore, the bigger the number of combinations is, the larger the increase in time and memory consumption is. In this paper, in order to mine top-k frequent patterns more efficiently, we consider converting patterns obtained from single-paths into composite patterns during the mining process and recovering them as the original patterns when the top-k frequent patterns are extracted. For this, we define a new concept, the composite pattern, and propose novel techniques for reducing pattern combinations in the single-path. Two algorithms are introduced in this paper, where the former is CRM (Combination Reducing method), applying our reduction manner, and the latter is CRMN (Combination Reducing method for N-itemset), considering N-itemset, i.e., patterns’ lengths. A performance evaluation shows that CRM and CRMN algorithms can efficiently reduce pattern combinations in single-paths compared to state-of-the-art algorithms. The experimental results also illustrate that our approaches have outstanding performance in terms of runtime, memory, and scalability.
Similar content being viewed by others
References
Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: Knowledge discovery and data mining (KDD), Jun 2009, pp 29–38
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc 20th int’l conf very large databases (VLDB), pp 487–499
Amphawan K, Lenca P, Surarerks A (2012) Mining top-k regular-frequent itemset using database partitioning and support estimation. Expert Syst Appl 39(2):1924–1936
Chang L, Wang T, Yang D, Luan H (2008) SeqStream: mining closed sequential patterns over stream sliding windows. In: International conference on data mining (ICDM), Dec 2008, pp 83–92
Chang L, Wang T, Yang D, Luan H, Tang S (2009) Efficient algorithms for incremental maintenance of closed sequential patterns in large databases. Data Knowl Eng 68:68–106
Cheung YL, Fu AW (2004) Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng 16(6):1052–1069
Chuang KT, Huang JL, Chen MS (2008) Mining top-k frequent patterns in the presence of the memory constraint. VLDB J 17(5):1321–1344
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent pattern tree approach. Data Min Knowl Discov 8(1):53–87
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86
Hewett R (2011) Mining software defect data to support software testing management. Appl Intell 34(2):245–257
Jiménez A, Galiano FB, Talavera JC (2012) Mining frequent patterns from XML data: efficient algorithms and design trade-offs. Expert Syst Appl 39(1):1134–1140
Lam HT, Calders T (2010) Mining top-k frequent items in a data stream with flexible sliding windows. In: International conference on knowledge discovery and data mining (KDD), Jul 2010, pp 283–292
Lee G, Yun U, Ryu KH (2014) Sliding window based weighted maximal frequent pattern mining over data streams. Expert Syst Appl 41(2):694–708
Li CW, Jea KF, Lin RP, Yen SF, Hsu CW (2012) Mining frequent patterns from dynamic data streams with data load management. J Syst Softw 85(6):1346–1362
Li G, Feng J, Wang J, Zhang Y, Zhou L (2006) Incremental mining of frequent query patterns from XML queries for caching. In: International conference on data mining (ICDM), Dec 2006, pp 350–361
Li H (2008) A sliding window method for finding top-k path traversal patterns over streaming web click-sequences. Expert Syst Appl 36(3):4382–4386
Li H (2009) Interactive mining of top-k frequent closed itemsets from data streams. Expert Syst Appl 36(7):10779–10788
Li X, Han J (2007) Mining approximate top-k subspace anomalies in multi-dimensional time-series data. In: Very large data bases (VLDB), Sep 2007, pp 447–458
Lin KW, Hsieh M, Tseng VS (2010) A novel prediction-based strategy for object tracking in sensor networks by mining seamless temporal movement patterns. Expert Syst Appl 37:2799–2807
Liu YH (2012) Mining frequent patterns from univariate uncertain data. Data Knowl Eng 71(1):47–68
Liu YH (2013) Stream mining on univariate uncertain data. Appl Intell 39(2):315–344
Lucchesea C, Orlando S, Perego R (2010) Mining top-k patterns from binary datasets in presence of noise. In: Proceedings of the SIAM international conference on data mining (SDM), April 2010, pp 165–176
Márquez-Vera C, Cano A, Romero C, Ventura S (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl Intell 38(3):315–330
Muzammal M, Raman R (2011) Mining sequential patterns from probabilistic databases. In: Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD), May 2011, pp 210–221
Pei J, Han J, Mao R (2000) Closet: an efficient algorithm for mining frequent closed itemsets. In: Proc ACM SIGMOD workshop research issues in data mining and knowledge discovery, May 2000, pp 21–30
Priya RV, Vadivel A, Thakur RS (2012) Maximal pattern mining using fast CP-tree for knowledge discovery. Int J Inf Syst Soc Change 3(1):56–74
Pyun G, Yun U, Ryu K (2014) Efficient frequent pattern mining based on linear prefix tree. Knowl-Based Syst 55(1):125–129
Sallaberry A, Pecheur N, Bringay S, Roche M, Teisseire M (2011) Sequential patterns mining and gene sequence visualization to discover novelty from microarray data. J Biomed Inform 44(5):760–774
Shie BE, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435
Tanbeer SK, Ahmed CF, Jeong BS, Lee YK (2009) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179(5):559–583
Tanbeer SK, Ahmed CF, Jeong BS, Lee YK (2009) Sliding window-based frequent pattern mining over data streams. Inf Sci 179(22):3843–3865
Tsai PS (2010) Mining top-k frequent closed itemsets over data streams using the sliding window model. Expert Syst Appl 37(10):6968–6973
Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-growth: an efficient algorithm for high utility itemset mining. In: Knowledge discovery and data mining (KDD), July 2010, pp 253–262
Wang J, Han J, Lu Y, Tzvetkov P (2005) TFP: an efficient algorithm for mining top-k frequent closed itemsets. Data Knowl Eng 17(5):652–664
Wang YT, Cheng JT (2011) Mining periodic movement patterns of mobile phone users based on an efficient sampling approach. Appl Intell 35(1):32–40
Wong RC, Fu AW (2006) Mining top-k frequent itemsets from data streams. Data Min Knowl Discov 13(2):193–217
Xiong H, Brodie M, Ma TOP-COP S (2006) Mining TOP-k strongly correlated pairs in large databases. In: International conference on data mining (ICDM), Dec 2006, pp 1162–1166
Yen SJ, Lee YS (2013) Mining non-redundant time-gap sequential patterns. Appl Intell 39(4):727–738
Yoo JS, Bow M (2011) Mining top-k closed co-location patterns. In: IEEE international conference on spatial data mining and geographical knowledge services (ICSDM), June 2011, pp 100–105
Yun U, Ryu KH (2010) Discovering important sequential patterns with length-decreasing weighted support constraints. Int J Inf Technol Decis Mak 9(4):575–599
Yun U, Ryu K (2011) Approximate weight frequent pattern mining with/without noisy environments. Knowl-Based Syst 24(1):73–82
Yun U, Shin H, Ryu KH, Yoon E (2012) An efficient mining algorithm for maximal weighted frequent patterns in transactional databases. Knowl-Based Syst 33:53–64
Yun U, Ryu K (2013) Efficient mining of maximal correlated weight frequent patterns. Intell Data Anal 17(5):917–939
Yun U, Lee G, Ryu K (2014) Mining maximal frequent patterns by considering weight conditions over data streams. Knowl-Based Syst 55(1):49–65
Vo B, Coenen F, Le B (2013) A new method for mining frequent weighted itemsets based on WIT-trees. Expert Syst Appl 40(4):1256–1264
Zhang X, Zhang Y (2011) Sliding-window top-k pattern mining on uncertain streams. J Comput Inf Syst 7(3):984–992
Zou J, Xiao J, Hou R, Wang Y (2010) Frequent instruction sequential pattern mining in hardware sample data. In: International conference on data mining (ICDM), Dec 2010, pp 1205–1210
Acknowledgements
This research was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF No. 2013005682 and 20080062611).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pyun, G., Yun, U. Mining top-k frequent patterns with combination reducing techniques. Appl Intell 41, 76–98 (2014). https://doi.org/10.1007/s10489-013-0506-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-013-0506-9