Skip to main content
Log in

Incremental high utility pattern mining with static and dynamic databases

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Pattern mining is a data mining technique used for discovering significant patterns and has been applied to various applications such as disease analysis in medical databases and decision making in business. Frequent pattern mining based on item frequencies is the most fundamental topic in the pattern mining field. However, it is difficult to discover the important patterns on the basis of only frequencies since characteristics of real-world databases such as relative importance of items and non-binary transactions are not reflected. In this regard, utility pattern mining has been considered as an emergent research topic that deals with the characteristics. In real-world applications, meanwhile newly generated data by continuous operation or data in other databases for integration analysis can be gradually added to the current database. To efficiently deal with both existing and new data as a database, it is necessary to reflect increased data to previous analysis results without analyzing the whole database again. In this paper, we propose an algorithm called HUPID-Growth (High Utility Patterns in Incremental Databases Growth) for mining high utility patterns in incremental databases. Moreover, we suggest a tree structure constructed with a single database scan named HUPID-Tree (High Utility Patterns in Incremental Databases Tree), and a restructuring method with a novel data structure called TIList (Tail-node Information List) in order to process incremental databases more efficiently. We conduct various experiments for performance evaluation with state-of-the-art algorithms. The experimental results show that the proposed algorithm more efficiently processes real datasets compared to previous ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB 1994), pp 487–499

  2. Ahmed CF, Tanbeer SK, Jeong B-S, Choi H-J (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11979–11991

    Article  Google Scholar 

  3. Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K, Choi H-J (2012) Single-pass incremental and interactive mining for weighted frequent patterns. Expert Syst Appl 39(9):7976–7994

    Article  Google Scholar 

  4. Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721

    Article  Google Scholar 

  5. Barber B, Hamilton HJ (2003) Extracting share frequent itemsets with infrequent subsets. Data Min Knowl Disc 7(2):153–185

    Article  MathSciNet  Google Scholar 

  6. Caldersa T, Dextersb N, Gillisc JJM, Goethalsb B (2014) Mining frequent itemsets in a stream. Inf Syst 39:233–255

    Article  Google Scholar 

  7. Cheung DW-L, Han J, Ng VTY, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the 12th international conference on data engineering (ICDE 1996), pp 106–114

  8. Cohen L, Avrahami-Bakish G, Last M, Kandel A, Kipersztok O (2008) Real-time data mining of non-stationary data streams from sensor networks. Inf Fusion 9(3):344–353

    Article  Google Scholar 

  9. Duonga H, Truonga T, Vob B (2014) An efficient method for mining frequent itemsets with double constraints. Eng Appl Artif Intell 27:148–154

    Article  Google Scholar 

  10. Erwin A, Gopalan RP, Achuthan NR (2008) Efficient mining high utility itemsets from large datasets. In: Advances in knowledge discovery and data mining (PAKDD 2008), pp 554–561

  11. Gigli G, Bossé É., Lampropoulos GA (2007) An optimized architecture for classification combining data fusion and data-mining. Inf Fusion 8(4):366–378

    Article  Google Scholar 

  12. Gionis A, Mannila H, Mielikäinen T, Tsaparas P (2007) Assessing data mining results via swap randomization. ACM Trans Knowl Discov Data 1(3)

  13. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 1–12

  14. Hämäläinen W, Nykänen Matti (2008) Efficient discovery of statistically significant association rules. In: IEEE international conference on data mining (ICDM), pp 203–212

  15. Hong T-P, Lee C-H, Wang S-L (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265

    Article  Google Scholar 

  16. Hong T-P, Wang C-Y, Tseng S-S (2011) An incremental mining algorithm for maintaining sequential patterns using pre-large sequences. Expert Syst Appl 38(6):7051–7058

    Article  Google Scholar 

  17. Lee G, Yun U, Ryu K (2014) Sliding window based weighted maximal frequent pattern mining over data streams. Expert Syst Appl 41(2):694–708

    Article  Google Scholar 

  18. Lee D, Park S-H, Moon S (2013) Utility-based association rule mining: a marketing solution for cross-selling. Expert Syst Appl 40(7):2715–2725

    Article  Google Scholar 

  19. Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 61(1):198–217

    Article  Google Scholar 

  20. Lijffijt J, Papapetrou P, Puolamäki K (2014) A statistical significance testing approach to mining the most informative set of patterns. Data Min Knowl Discov 28(1):238–263

    Article  MATH  MathSciNet  Google Scholar 

  21. Lin M-Y, Tu T-F, Hsueh S-C (2012) High utility pattern mining using the maximal itemset property and lexicographic tree structures. Inf Sci 215:1–14

    Article  Google Scholar 

  22. Lin C-W, Hong T-P, Lu W-H (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424

    Article  Google Scholar 

  23. Lin C-W, Lan G-C, Hong T-P (2012) An incremental mining algorithm for high utility itemsets. Expert Syst Appl 39(8):7173–7180

    Article  Google Scholar 

  24. Liu M, Qu J-F (2012) Mining high utility itemsets without candidate generation. In: International conference on information and knowledge management (CIKM 2012), pp 55–64

  25. Liu J, Wang K, Fung BCM (2012) Direct Discovery of High Utility Itemsets without Candidate Generation. In: Proceedings of the 2012 IEEE international conference on data mining (ICDM 2012), pp 984–989

  26. Liu Y, Liao W-K, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in knowledge discovery and data mining (PAKDD 2005), pp 689–695

  27. Mallick B, Garg D, Grover PS (2013) Incremental mining of sequential patterns: Progress and challenges. Int Data Anal 17(3):507–530

    Google Scholar 

  28. Palmieri F, Ciuonzo D (2013) Objective priors from maximum entropy in data classification. Inf Fusion 14 (2):186–198

    Article  Google Scholar 

  29. Pisharath J, Liu Y, Ozisikyilmaz B, Narayanan R, Liao WK, Choudhary A, Memik G NU-MineBench version 2.0 dataset and technical report. http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html

  30. Pyun G, Yun U, Ryu K (2014) Efficient frequent pattern mining based on linear prefix tree. Knowl Based Syst 55:125–139

    Article  Google Scholar 

  31. Pyun G, Yun U (2014) Mining top-k frequent patterns with combination reducing techniques. Appl Intell 41(1):76–98

    Article  Google Scholar 

  32. Ryang H, Yun U, Ryu K (2014) Discovering high utility itemsets with multiple minimum supports. Intelligent data analysis. (In Press)

  33. Shie B-E, Hsiao H-F, Tseng VS (2013) Efficient algorithms for discovering high utility user behavior patterns in mobile commerce environments. Knowl Inf Syst 37(2):363–387

    Article  Google Scholar 

  34. Shie B-E, Yu PS, Tseng VS (2012) Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst Appl 39(17):12947–12960

    Article  Google Scholar 

  35. Shie B-E, Hsiao H-F, Tseng VS, Yu PS (2011) Mining high utility mobile sequential patterns in mobile commerce environments. In: Database systems for advanced applications (DASFAA 2011), pp 224–238

  36. Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Int 40(1):29–43

    Article  MathSciNet  Google Scholar 

  37. Tanbeer SK, Ahmed CF, Jeong B-S, Lee Y-K (2009) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179(5):559–583

    Article  MATH  MathSciNet  Google Scholar 

  38. Tseng VS, Shie B-E, Wu C-W, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786

    Article  Google Scholar 

  39. Tseng VS, Wu C-W, Shie B-E, Yu PS (2010) UP-Growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2010), pp 253–262

  40. Vo B, Coenen F, Le Bac (2013) A new method for mining frequent weighted itemsets based on wit-trees. Expert Syst Appl 40(4):1256–1264

    Article  Google Scholar 

  41. Wen Y, Bein D, Phoha S (2014) Dynamic clustering of multi-modal sensor networks in urban scenarios. Inf Fusion 15:130–140

    Article  Google Scholar 

  42. Wu C-W, Lin Y-F, Yu PS, Tseng VS (2013) Mining high utility episodes in complex event sequences. In: Knowledge discovery and data mining (KDD 2013), pp 536–544

  43. Wu C-W, Fournier-Viger P, Yu PS, Tseng VS (2011) Efficient mining of a concise and loss-less representation of high utility itemsets. In: The 11th IEEE international conference on data mining (ICDM 2011), pp 824–833

  44. Yeh J-S, Li Y-C, Chang C-C (2007) Two-phase algorithms for a novel utility-frequent mining model. In: Emerging technologies in knowledge discovery and data mining (PAKDD 2007), pp 433–444

  45. Yen S-J, Lee Y-S, Wang C-K (2014) An efficient algorithm for incrementally mining frequent closed itemsets. Appl Int 40(4):649–668

    Article  Google Scholar 

  46. Yin J, Zheng Z, Cao L (2012) USpan: an efficient algorithm for mining high utility sequential patterns. In: Knowledge discovery and data mining (KDD 2012), pp 660–668

  47. Yun U, Ryu K (2013) Efficient mining of maximal correlated weight frequent patterns. Int Data Anal 17(5):917–939

    Google Scholar 

  48. Yun U, Lee G, Ryu K (2014) Mining maximal frequent patterns by considering weight conditions over data streams. Knowl Based Syst 55:49–65

    Article  Google Scholar 

  49. Yun U, Ryang H, Ryu K (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878

    Article  Google Scholar 

  50. Yu L, Huang W, Wang S, Lai KK (2008) Web warehouse - a new web information fusion tool for web mining. Inf Fusion 9(4):501–511

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by the MSIP (Ministry of Science, ICT & Future Planning), Korea, under ICT/SW Creative research program supervised by the NIPA (National ICT Industry Promotion Agency) (NIPA-2014-H0502-14-3008) and the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF No. 2013-005682).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Unil Yun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yun, U., Ryang, H. Incremental high utility pattern mining with static and dynamic databases. Appl Intell 42, 323–352 (2015). https://doi.org/10.1007/s10489-014-0601-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0601-6

Keywords

Navigation