Abstract
Mining high average-utility itemsets (HAUIs) is a promising research topic in data mining because, in contrast to high utility itemsets, they are not biased toward long itemsets. Regardless of what upper bounds and pruning strategies are used, most existing HAUI mining algorithms are founded on the concept of maximal utility, namely the highest utility of a single item in each transaction. In this paper, we study this problem by generalizing the typical maximal utility and average-utility upper bound from a single item to an itemset, and propose an efficient HAIU mining algorithm based on generalized maximal utility (HAUIM-GMU). For this algorithm, we first propose the concepts of generalized maximal utility and the generalized average-utility upper bound, and discuss how the proposed upper bound can be made tighter to generate fewer candidates. A new pruning strategy is then proposed based on the concept of support, and this is shown to be effective for filtering out unpromising itemsets. The final algorithm is described in detail. Extensive experimental results show that the HAUIM-GMU algorithm outperforms existing state-of-the-art algorithms.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings 20th international conference on very large data bases. Morgan Kaufmann, Santiago de Chile, pp 487–499
Deng Z-H (2018) An efficient structure for fast mining high utility itemsets. Appl Intell 48(9):3161–3177
Fournier-Viger P, Lin CW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: Proceedings of the 19th European conference on machine learning and knowledge discovery in databases, Riva del Garda, Italy (September 2016) Lecture notes in computer science, vol 9853. Springer, Cham, pp 36–40
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Hong T-P, Lee C-H, Wang S-L (2009) Mining high average-utility itemsets. In: Proceedings of the 2009 IEEE international conference on systems, man, and cybernetics. IEEE, San Antonio, pp 2526–2530
Jaysawal BP, Huang J-W (2019) DMHUPS: discovering multiple high utility patterns simultaneously. Knowl Inf Syst 59(2):337–359
Kim D, Yun U (2017) Efficient algorithm for mining high average-utility itemsets in incremental transaction databases. Appl Intell 47(1):114–131
Lan G-C, Hong T-P, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Tech Decis 11(5):1009–1030
Lan G-C, Hong T-P, Tseng VS (2012) A projection-based approach for discovering high average-utility itemsets. J Inform Sci Eng 28:193–209
Lin C-W, Hong T-P, Lu W-H (2010) Efficiently mining high average utility itemsets with a tree structure. In: Proceedings of the second international conference on intelligent information and database systems, Hue City, Vietnam (March 2010). Lecture notes in computer science, vol 5990. Springer, Berlin, pp 131–139
Lin J C-W, Li T, Fournier-Viger P, Hong T-P, Su J-H (2016) Efficient mining of high average-utility itemsets with multiple minimum thresholds. In: Proceedings of the industrial conference on data mining, New York, NY, USA (July 2016). Lecture notes in computer science, vol 9728. Springer, Cham, pp 14–28
Lin JC-W, Li T, Fournier-Viger P, Hong T-P, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243
Lin JC-W, Ren S, Fournier-Viger P (2018) MEMU: more efficient algorithm to mine high average-utility patterns with multiple minimum average-utility thresholds. IEEE Access 6:7593–7609
Lin JC-W, Ren S, Fournier-Viger P, Hong T-P (2017) EHAUPM: efficient high average-utility pattern mining with tighter upper bounds. IEEE Access 5:12927–12940
Lin JC-W, Shao Y, Fournier-Viger P, Djenouri Y, Guo X (2018) Maintenance algorithm for high average-utility itemsets with transaction deletion. Appl Intell 48(10):3691–3706
Liu Y, Liao W-K, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining, Hanoi, Vietnam (May 2005). Lecture notes in computer science, vol 3518. Springer, Berlin, pp 689–695
Lu T, Vo B, Nguyen H, Hong T-P (2015) A new method for mining high average utility itemsets. In: Proceedings of the 13th IFIP international conference on computer information systems and industrial management. Springer, Ho Chi Minh City, pp 33–42
Sethi KK, Ramesh D, Sreenu M (2019) Parallel high average-utility itemset mining using better search space division approach. In: Proceedings of the international conference on distributed computing and internet technology, Bhubaneswar, India (January 2019). Lecture notes in computer science, vol 11319. Springer, Cham, pp 108–124
Song W, Liu Y, Li JH (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40(1):29–43
Song W, Liu Y, Li JH (2014) BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. Int J Data Warehous 10(1):1–15
Song W, Yang BR, Xu ZY (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl-Based Syst 21(6):507–513
Song W, Zhang Z, Li JH (2016) A high utility itemset mining algorithm based on subsume index. Knowl Inf Syst 49(1):315–340
Wu JM-T, Lin JC-W, Pirouz M, Fournier-Viger P (2018) TUB-HAUPM: tighter upper bound for mining high average-utility patterns. IEEE Access 6:18655–18669
Wu R, He Z (2018) Top-k high average-utility itemsets mining with effective pruning strategies. Appl Intell 48(10):3429–3445
Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Future Gen Comp Syst 68:346–360
Yun U, Kim D, Ryang H, Lee G, Lee K-M (2016) Mining recent high average utility patterns based on sliding window from stream data. J Intell Fuzzy Syst 30(6):3605–3617
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions, which helped to improve the quality of this paper. This work was partially supported by the National Natural Science Foundation of China (61977001) and the Great Wall Scholar Program (CIT & TCD20190305).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Song, W., Liu, L. & Huang, C. Generalized maximal utility for mining high average-utility itemsets. Knowl Inf Syst 63, 2947–2967 (2021). https://doi.org/10.1007/s10115-021-01614-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01614-z