Abstract
Pattern mining is a data mining technique used for discovering significant patterns and has been applied to various applications such as disease analysis in medical databases and decision making in business. Frequent pattern mining based on item frequencies is the most fundamental topic in the pattern mining field. However, it is difficult to discover the important patterns on the basis of only frequencies since characteristics of real-world databases such as relative importance of items and non-binary transactions are not reflected. In this regard, utility pattern mining has been considered as an emergent research topic that deals with the characteristics. In real-world applications, meanwhile newly generated data by continuous operation or data in other databases for integration analysis can be gradually added to the current database. To efficiently deal with both existing and new data as a database, it is necessary to reflect increased data to previous analysis results without analyzing the whole database again. In this paper, we propose an algorithm called HUPID-Growth (High Utility Patterns in Incremental Databases Growth) for mining high utility patterns in incremental databases. Moreover, we suggest a tree structure constructed with a single database scan named HUPID-Tree (High Utility Patterns in Incremental Databases Tree), and a restructuring method with a novel data structure called TIList (Tail-node Information List) in order to process incremental databases more efficiently. We conduct various experiments for performance evaluation with state-of-the-art algorithms. The experimental results show that the proposed algorithm more efficiently processes real datasets compared to previous ones.




















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB 1994), pp 487–499
Ahmed CF, Tanbeer SK, Jeong B-S, Choi H-J (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11979–11991
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K, Choi H-J (2012) Single-pass incremental and interactive mining for weighted frequent patterns. Expert Syst Appl 39(9):7976–7994
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Barber B, Hamilton HJ (2003) Extracting share frequent itemsets with infrequent subsets. Data Min Knowl Disc 7(2):153–185
Caldersa T, Dextersb N, Gillisc JJM, Goethalsb B (2014) Mining frequent itemsets in a stream. Inf Syst 39:233–255
Cheung DW-L, Han J, Ng VTY, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the 12th international conference on data engineering (ICDE 1996), pp 106–114
Cohen L, Avrahami-Bakish G, Last M, Kandel A, Kipersztok O (2008) Real-time data mining of non-stationary data streams from sensor networks. Inf Fusion 9(3):344–353
Duonga H, Truonga T, Vob B (2014) An efficient method for mining frequent itemsets with double constraints. Eng Appl Artif Intell 27:148–154
Erwin A, Gopalan RP, Achuthan NR (2008) Efficient mining high utility itemsets from large datasets. In: Advances in knowledge discovery and data mining (PAKDD 2008), pp 554–561
Gigli G, Bossé É., Lampropoulos GA (2007) An optimized architecture for classification combining data fusion and data-mining. Inf Fusion 8(4):366–378
Gionis A, Mannila H, Mielikäinen T, Tsaparas P (2007) Assessing data mining results via swap randomization. ACM Trans Knowl Discov Data 1(3)
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 1–12
Hämäläinen W, Nykänen Matti (2008) Efficient discovery of statistically significant association rules. In: IEEE international conference on data mining (ICDM), pp 203–212
Hong T-P, Lee C-H, Wang S-L (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Hong T-P, Wang C-Y, Tseng S-S (2011) An incremental mining algorithm for maintaining sequential patterns using pre-large sequences. Expert Syst Appl 38(6):7051–7058
Lee G, Yun U, Ryu K (2014) Sliding window based weighted maximal frequent pattern mining over data streams. Expert Syst Appl 41(2):694–708
Lee D, Park S-H, Moon S (2013) Utility-based association rule mining: a marketing solution for cross-selling. Expert Syst Appl 40(7):2715–2725
Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 61(1):198–217
Lijffijt J, Papapetrou P, Puolamäki K (2014) A statistical significance testing approach to mining the most informative set of patterns. Data Min Knowl Discov 28(1):238–263
Lin M-Y, Tu T-F, Hsueh S-C (2012) High utility pattern mining using the maximal itemset property and lexicographic tree structures. Inf Sci 215:1–14
Lin C-W, Hong T-P, Lu W-H (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424
Lin C-W, Lan G-C, Hong T-P (2012) An incremental mining algorithm for high utility itemsets. Expert Syst Appl 39(8):7173–7180
Liu M, Qu J-F (2012) Mining high utility itemsets without candidate generation. In: International conference on information and knowledge management (CIKM 2012), pp 55–64
Liu J, Wang K, Fung BCM (2012) Direct Discovery of High Utility Itemsets without Candidate Generation. In: Proceedings of the 2012 IEEE international conference on data mining (ICDM 2012), pp 984–989
Liu Y, Liao W-K, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in knowledge discovery and data mining (PAKDD 2005), pp 689–695
Mallick B, Garg D, Grover PS (2013) Incremental mining of sequential patterns: Progress and challenges. Int Data Anal 17(3):507–530
Palmieri F, Ciuonzo D (2013) Objective priors from maximum entropy in data classification. Inf Fusion 14 (2):186–198
Pisharath J, Liu Y, Ozisikyilmaz B, Narayanan R, Liao WK, Choudhary A, Memik G NU-MineBench version 2.0 dataset and technical report. http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html
Pyun G, Yun U, Ryu K (2014) Efficient frequent pattern mining based on linear prefix tree. Knowl Based Syst 55:125–139
Pyun G, Yun U (2014) Mining top-k frequent patterns with combination reducing techniques. Appl Intell 41(1):76–98
Ryang H, Yun U, Ryu K (2014) Discovering high utility itemsets with multiple minimum supports. Intelligent data analysis. (In Press)
Shie B-E, Hsiao H-F, Tseng VS (2013) Efficient algorithms for discovering high utility user behavior patterns in mobile commerce environments. Knowl Inf Syst 37(2):363–387
Shie B-E, Yu PS, Tseng VS (2012) Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst Appl 39(17):12947–12960
Shie B-E, Hsiao H-F, Tseng VS, Yu PS (2011) Mining high utility mobile sequential patterns in mobile commerce environments. In: Database systems for advanced applications (DASFAA 2011), pp 224–238
Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Int 40(1):29–43
Tanbeer SK, Ahmed CF, Jeong B-S, Lee Y-K (2009) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179(5):559–583
Tseng VS, Shie B-E, Wu C-W, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Tseng VS, Wu C-W, Shie B-E, Yu PS (2010) UP-Growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2010), pp 253–262
Vo B, Coenen F, Le Bac (2013) A new method for mining frequent weighted itemsets based on wit-trees. Expert Syst Appl 40(4):1256–1264
Wen Y, Bein D, Phoha S (2014) Dynamic clustering of multi-modal sensor networks in urban scenarios. Inf Fusion 15:130–140
Wu C-W, Lin Y-F, Yu PS, Tseng VS (2013) Mining high utility episodes in complex event sequences. In: Knowledge discovery and data mining (KDD 2013), pp 536–544
Wu C-W, Fournier-Viger P, Yu PS, Tseng VS (2011) Efficient mining of a concise and loss-less representation of high utility itemsets. In: The 11th IEEE international conference on data mining (ICDM 2011), pp 824–833
Yeh J-S, Li Y-C, Chang C-C (2007) Two-phase algorithms for a novel utility-frequent mining model. In: Emerging technologies in knowledge discovery and data mining (PAKDD 2007), pp 433–444
Yen S-J, Lee Y-S, Wang C-K (2014) An efficient algorithm for incrementally mining frequent closed itemsets. Appl Int 40(4):649–668
Yin J, Zheng Z, Cao L (2012) USpan: an efficient algorithm for mining high utility sequential patterns. In: Knowledge discovery and data mining (KDD 2012), pp 660–668
Yun U, Ryu K (2013) Efficient mining of maximal correlated weight frequent patterns. Int Data Anal 17(5):917–939
Yun U, Lee G, Ryu K (2014) Mining maximal frequent patterns by considering weight conditions over data streams. Knowl Based Syst 55:49–65
Yun U, Ryang H, Ryu K (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878
Yu L, Huang W, Wang S, Lai KK (2008) Web warehouse - a new web information fusion tool for web mining. Inf Fusion 9(4):501–511
Acknowledgments
This research was supported by the MSIP (Ministry of Science, ICT & Future Planning), Korea, under ICT/SW Creative research program supervised by the NIPA (National ICT Industry Promotion Agency) (NIPA-2014-H0502-14-3008) and the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF No. 2013-005682).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yun, U., Ryang, H. Incremental high utility pattern mining with static and dynamic databases. Appl Intell 42, 323–352 (2015). https://doi.org/10.1007/s10489-014-0601-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0601-6