Incremental high utility pattern mining with static and dynamic databases

Yun, Unil; Ryang, Heungmo

doi:10.1007/s10489-014-0601-6

Incremental high utility pattern mining with static and dynamic databases

Published: 12 October 2014

Volume 42, pages 323–352, (2015)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Unil Yun¹ &
Heungmo Ryang¹

943 Accesses
65 Citations
Explore all metrics

Abstract

Pattern mining is a data mining technique used for discovering significant patterns and has been applied to various applications such as disease analysis in medical databases and decision making in business. Frequent pattern mining based on item frequencies is the most fundamental topic in the pattern mining field. However, it is difficult to discover the important patterns on the basis of only frequencies since characteristics of real-world databases such as relative importance of items and non-binary transactions are not reflected. In this regard, utility pattern mining has been considered as an emergent research topic that deals with the characteristics. In real-world applications, meanwhile newly generated data by continuous operation or data in other databases for integration analysis can be gradually added to the current database. To efficiently deal with both existing and new data as a database, it is necessary to reflect increased data to previous analysis results without analyzing the whole database again. In this paper, we propose an algorithm called HUPID-Growth (High Utility Patterns in Incremental Databases Growth) for mining high utility patterns in incremental databases. Moreover, we suggest a tree structure constructed with a single database scan named HUPID-Tree (High Utility Patterns in Incremental Databases Tree), and a restructuring method with a novel data structure called TIList (Tail-node Information List) in order to process incremental databases more efficiently. We conduct various experiments for performance evaluation with state-of-the-art algorithms. The experimental results show that the proposed algorithm more efficiently processes real datasets compared to previous ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trends and Future Perspective Challenges in Big Data

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Rashmin Gajera, Suresh Patel, … Ayush Solanki

Uncertainty in big data analytics: survey, opportunities, and challenges

Article Open access 04 June 2019

Reihaneh H. Hariri, Erik M. Fredericks & Kate M. Bowers

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB 1994), pp 487–499
Ahmed CF, Tanbeer SK, Jeong B-S, Choi H-J (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11979–11991
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K, Choi H-J (2012) Single-pass incremental and interactive mining for weighted frequent patterns. Expert Syst Appl 39(9):7976–7994
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Article Google Scholar
Barber B, Hamilton HJ (2003) Extracting share frequent itemsets with infrequent subsets. Data Min Knowl Disc 7(2):153–185
Article MathSciNet Google Scholar
Caldersa T, Dextersb N, Gillisc JJM, Goethalsb B (2014) Mining frequent itemsets in a stream. Inf Syst 39:233–255
Article Google Scholar
Cheung DW-L, Han J, Ng VTY, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the 12th international conference on data engineering (ICDE 1996), pp 106–114
Cohen L, Avrahami-Bakish G, Last M, Kandel A, Kipersztok O (2008) Real-time data mining of non-stationary data streams from sensor networks. Inf Fusion 9(3):344–353
Article Google Scholar
Duonga H, Truonga T, Vob B (2014) An efficient method for mining frequent itemsets with double constraints. Eng Appl Artif Intell 27:148–154
Article Google Scholar
Erwin A, Gopalan RP, Achuthan NR (2008) Efficient mining high utility itemsets from large datasets. In: Advances in knowledge discovery and data mining (PAKDD 2008), pp 554–561
Gigli G, Bossé É., Lampropoulos GA (2007) An optimized architecture for classification combining data fusion and data-mining. Inf Fusion 8(4):366–378
Article Google Scholar
Gionis A, Mannila H, Mielikäinen T, Tsaparas P (2007) Assessing data mining results via swap randomization. ACM Trans Knowl Discov Data 1(3)
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 1–12
Hämäläinen W, Nykänen Matti (2008) Efficient discovery of statistically significant association rules. In: IEEE international conference on data mining (ICDM), pp 203–212
Hong T-P, Lee C-H, Wang S-L (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Article Google Scholar
Hong T-P, Wang C-Y, Tseng S-S (2011) An incremental mining algorithm for maintaining sequential patterns using pre-large sequences. Expert Syst Appl 38(6):7051–7058
Article Google Scholar
Lee G, Yun U, Ryu K (2014) Sliding window based weighted maximal frequent pattern mining over data streams. Expert Syst Appl 41(2):694–708
Article Google Scholar
Lee D, Park S-H, Moon S (2013) Utility-based association rule mining: a marketing solution for cross-selling. Expert Syst Appl 40(7):2715–2725
Article Google Scholar
Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 61(1):198–217
Article Google Scholar
Lijffijt J, Papapetrou P, Puolamäki K (2014) A statistical significance testing approach to mining the most informative set of patterns. Data Min Knowl Discov 28(1):238–263
Article MATH MathSciNet Google Scholar
Lin M-Y, Tu T-F, Hsueh S-C (2012) High utility pattern mining using the maximal itemset property and lexicographic tree structures. Inf Sci 215:1–14
Article Google Scholar
Lin C-W, Hong T-P, Lu W-H (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424
Article Google Scholar
Lin C-W, Lan G-C, Hong T-P (2012) An incremental mining algorithm for high utility itemsets. Expert Syst Appl 39(8):7173–7180
Article Google Scholar
Liu M, Qu J-F (2012) Mining high utility itemsets without candidate generation. In: International conference on information and knowledge management (CIKM 2012), pp 55–64
Liu J, Wang K, Fung BCM (2012) Direct Discovery of High Utility Itemsets without Candidate Generation. In: Proceedings of the 2012 IEEE international conference on data mining (ICDM 2012), pp 984–989
Liu Y, Liao W-K, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in knowledge discovery and data mining (PAKDD 2005), pp 689–695
Mallick B, Garg D, Grover PS (2013) Incremental mining of sequential patterns: Progress and challenges. Int Data Anal 17(3):507–530
Google Scholar
Palmieri F, Ciuonzo D (2013) Objective priors from maximum entropy in data classification. Inf Fusion 14 (2):186–198
Article Google Scholar
Pisharath J, Liu Y, Ozisikyilmaz B, Narayanan R, Liao WK, Choudhary A, Memik G NU-MineBench version 2.0 dataset and technical report. http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html
Pyun G, Yun U, Ryu K (2014) Efficient frequent pattern mining based on linear prefix tree. Knowl Based Syst 55:125–139
Article Google Scholar
Pyun G, Yun U (2014) Mining top-k frequent patterns with combination reducing techniques. Appl Intell 41(1):76–98
Article Google Scholar
Ryang H, Yun U, Ryu K (2014) Discovering high utility itemsets with multiple minimum supports. Intelligent data analysis. (In Press)
Shie B-E, Hsiao H-F, Tseng VS (2013) Efficient algorithms for discovering high utility user behavior patterns in mobile commerce environments. Knowl Inf Syst 37(2):363–387
Article Google Scholar
Shie B-E, Yu PS, Tseng VS (2012) Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst Appl 39(17):12947–12960
Article Google Scholar
Shie B-E, Hsiao H-F, Tseng VS, Yu PS (2011) Mining high utility mobile sequential patterns in mobile commerce environments. In: Database systems for advanced applications (DASFAA 2011), pp 224–238
Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Int 40(1):29–43
Article MathSciNet Google Scholar
Tanbeer SK, Ahmed CF, Jeong B-S, Lee Y-K (2009) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179(5):559–583
Article MATH MathSciNet Google Scholar
Tseng VS, Shie B-E, Wu C-W, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Article Google Scholar
Tseng VS, Wu C-W, Shie B-E, Yu PS (2010) UP-Growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2010), pp 253–262
Vo B, Coenen F, Le Bac (2013) A new method for mining frequent weighted itemsets based on wit-trees. Expert Syst Appl 40(4):1256–1264
Article Google Scholar
Wen Y, Bein D, Phoha S (2014) Dynamic clustering of multi-modal sensor networks in urban scenarios. Inf Fusion 15:130–140
Article Google Scholar
Wu C-W, Lin Y-F, Yu PS, Tseng VS (2013) Mining high utility episodes in complex event sequences. In: Knowledge discovery and data mining (KDD 2013), pp 536–544
Wu C-W, Fournier-Viger P, Yu PS, Tseng VS (2011) Efficient mining of a concise and loss-less representation of high utility itemsets. In: The 11th IEEE international conference on data mining (ICDM 2011), pp 824–833
Yeh J-S, Li Y-C, Chang C-C (2007) Two-phase algorithms for a novel utility-frequent mining model. In: Emerging technologies in knowledge discovery and data mining (PAKDD 2007), pp 433–444
Yen S-J, Lee Y-S, Wang C-K (2014) An efficient algorithm for incrementally mining frequent closed itemsets. Appl Int 40(4):649–668
Article Google Scholar
Yin J, Zheng Z, Cao L (2012) USpan: an efficient algorithm for mining high utility sequential patterns. In: Knowledge discovery and data mining (KDD 2012), pp 660–668
Yun U, Ryu K (2013) Efficient mining of maximal correlated weight frequent patterns. Int Data Anal 17(5):917–939
Google Scholar
Yun U, Lee G, Ryu K (2014) Mining maximal frequent patterns by considering weight conditions over data streams. Knowl Based Syst 55:49–65
Article Google Scholar
Yun U, Ryang H, Ryu K (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878
Article Google Scholar
Yu L, Huang W, Wang S, Lai KK (2008) Web warehouse - a new web information fusion tool for web mining. Inf Fusion 9(4):501–511
Article Google Scholar

Download references

Acknowledgments

This research was supported by the MSIP (Ministry of Science, ICT & Future Planning), Korea, under ICT/SW Creative research program supervised by the NIPA (National ICT Industry Promotion Agency) (NIPA-2014-H0502-14-3008) and the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF No. 2013-005682).

Author information

Authors and Affiliations

Department of Computer Engineering, Sejong University, Seoul, Republic of Korea
Unil Yun & Heungmo Ryang

Authors

Unil Yun
View author publications
You can also search for this author in PubMed Google Scholar
Heungmo Ryang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Unil Yun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yun, U., Ryang, H. Incremental high utility pattern mining with static and dynamic databases. Appl Intell 42, 323–352 (2015). https://doi.org/10.1007/s10489-014-0601-6

Download citation

Published: 12 October 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s10489-014-0601-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental high utility pattern mining with static and dynamic databases

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Uncertainty in big data analytics: survey, opportunities, and challenges

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Incremental high utility pattern mining with static and dynamic databases

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Uncertainty in big data analytics: survey, opportunities, and challenges

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation