Abstract
In this paper, we present a novel algorithm for efficiently mining high average-utility itemsets (HAUIs) from incremental databases, in which their volumes can be expanded dynamically. The previous algorithms have inefficiencies in that they must scan a given database multiple times so as to generate candidate itemsets and determine valid itemsets level by level. The reason is that they follow the basic framework of an Apriori-like approach. This drawback can cause critical problems in processing incremental databases because scanning a database becomes a tougher task as the size of the database is increased. In contrast, the algorithm proposed in this paper builds a compact tree structure maintaining all necessary information in order to avoid such excessive database scanning during its mining process. The previous algorithms suffer from the huge generation of unnecessary candidate itemsets at each level accompanied by the naive combination based candidate generation manner of an Apriori-like approach, which generates candidate itemsets with (k+1)-lengths by simply joining itemsets with k-lengths. On the other hand, our algorithm employs the pattern growth approach, which allows the algorithm to generate a set of only essential candidate itemsets. In order for our algorithm to constantly preserve the compactness of its tree structure during the entire incremental mining process, a restructuring technique is exploited. In the performance evaluation, we show that our algorithm is faster and consumes less memory space than competitors.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: 20th international conference on very large data bases, pp 487–499
Ahmed CF, Tanbeer SK, Jeong B, Lee Y (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Software 1:23–34
Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating approach. In: The 12th IEEE international conference on data engineering, pp 106–114
Duong Q, Liao B, Fournier-Viger P, Dam T (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122
Fournier-Viger P, Wu C, Zida S, Tseng V (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: ISMIS, pp 83–92
Fan Y, Ye Y, Chen L (2016) Malicious sequential pattern mining for automatic malware detection. Expert Syst Appl 52:16–25
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 1–12
Hong T, Lee C, Wang S (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Hong T, Lee C, Wang S (2009) An incremental mining algorithm for high average-utility itemsets. In: ISPAN 2009, pp 421–425
Koh J, Shieh S (2003) An efficient approach for maintaining association rules based on adjusting FP-tree structures. In: DASFAA, pp 417–424
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Kim D, Yun U (2016) Efficient mining of high utility pattern with considering of rarity and length. Appl Intell 45(1):152–173
Kim D, Yun U (2016) Mining high utility itemsets based on the time decaying model. Intell Data Anal 20 (5):1157–1180
Lan G, Hong T, Tseng V (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28:193–209
Lan G, Hong T, Tseng V (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Making 11(5):1009–1030
Le T, Vo B (2015) An N-list-based algorithm for mining frequent closed patterns. Expert Syst Appl 42 (19):6648–6657
Lee G, Yun U, Ryu K (2014) Sliding window based weighted maximal frequent pattern mining over data streamss. Expert Syst Appl 41(2):694–708
Lee G, Yun U, Ryang H (2015) An uncertainty-based approach: frequent itemset mining from uncertain data with different item importance. Knowl-Based Syst 90:239–256
Lee G, Yun U, Ryang H, Kim D (2016) Approximate maximal frequent pattern mining with weight conditions and error tolerance. Int J Pattern Recognit Artif Intell 30(6):1–42
Lee G, Yun U, Ryang H, Kim D (2016) Erasable itemset mining over incremental databases with weight conditions. Eng Appl Artif Intell 52:213–234
Lin J, Gan W, Hong T, Tseng V (2015) Efficient algorithms for mining up-to-date high utility patterns. Adv Eng Inform 29(3):648–661
Lin J, Gan W, Fournier-Viger P, Hong T, Tseng V (2016) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187
Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in knowledge discovery and data mining, pp 689–695
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp 55–64
Lu T, Vo B, Nguyen HT, Hong T (2014) A new method for mining high average utility itemsets. In: Computer Information Systems and Industrial Management, pp 33–42
Pisharath J, Liu Y, Ozisikyilmaz B, Narayanan R, Liao WK, Choudhary A Memik G NU-MineBench version 2.0 dataset and technical report, http://cucis.ece.northwestern.edu/projects/DMS/
Ryang H, Yun U (2015) Top-K high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126
Ryang H, Yun U, Ryu K (2016) Fast algorithm for high utility pattern mining with sum of item quantities. Intell Data Anal 20(2):395–415
Tseng V, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Tseng V, Wu C, Fournier-Viger P, Yu PS (2016) Efficient algorithms for mining top-K high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Tanbeer SK, Ahmed CF, Jeong B, Lee Y (2009) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179(5):559–583
Tsai C, Lai B (2015) A location-item-time sequential pattern mining algorithm for route recommendation. Knowl-Based Syst 73:97–110
Yun U, Ryang H (2015) Incremental high utility pattern mining with static and dynamic databases. Appl Intell 42(2):323–352
Yun U, Ryang H, Ryu K (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878
Yun U, Kim D, Ryang H, Lee G, Lee K (2016) Mining recent high average utility patterns based on sliding window from stream data. J Intell Fuzzy Syst 30(6):3605–3617
Yun U, Lee G (2016) Incremental mining of weighted maximal frequent itemsets from dynamic databases. Expert Syst Appl 54:304–327
Yun U, Lee G (2016) Sliding window based weighted erasable stream pattern mining for stream data applications. Futur Gener Comput Syst 59:1–20
Yun U, Lee G, Kim C (2016) The smallest valid extension-based efficient, rare graph pattern mining, considering length-decreasing support constraints and symmetry characteristics of graphs. Symmetry 8(5):1–26
Yun U, Pyun G, Yoon E (2015) Efficient mining of robust closed weighted sequential patterns without information loss. Int J Artif Intell Tools 24(1):1–28
Yun U, Lee G, Lee K (2016) Efficient representative pattern mining based on weight and maximality conditions. Expert Syst 33(5):439–462
Zhang J, Wang Y, Yang D (2015) CCSpan: mining closed contiguous sequential patterns. Knowl-Based Syst 89:1–13
Zhang X, Deng Z (2015) Mining summarization of high utility itemsets. Knowl-Based Syst 84:67–77
Acknowledgments
This research was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF No. 20152062051 and NRF No. 20155054624), and the Business for Academic-industrial Cooperative establishments funded Korea Small and Medium Business Administration in 2015 (Grants No. C0261068).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, D., Yun, U. Efficient algorithm for mining high average-utility itemsets in incremental transaction databases. Appl Intell 47, 114–131 (2017). https://doi.org/10.1007/s10489-016-0890-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0890-z