Abstract
High-utility itemset mining (HUIM) is an important task in data mining that can retrieve more meaningful and useful patterns for decision-making. One-phase HUIM algorithms based on the utility-list structure have been shown to be the most efficient as they can mine high-utility itemsets (HUIs) without generating candidates. However, storing itemset information for the utility-list is time-consuming and memory consuming. To address this problem, we propose an efficient simplified utility-list-based HUIM algorithm (HUIM-SU). In the proposed HUIM-SU algorithm, the simplified utility-list is proposed to obtain all HUIs effectively and reduce memory usage in the depth-first search process. Based on the the simplified utility-list, repeated pruning according to the transaction-weighted utilisation (TWU) reduces the number of items. In addition, a construction tree and compressed storage are introduced to further reduce the search space and the memory usage. The extension utility and itemset TWU are then proposed to be the upper bounds, which reduce the search space considerably. Extensive experimental results on dense and sparse datasets indicate that the proposed HUIM-SU algorithm is highly efficient in terms of the number of candidates, memory usage, and execution time.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Luna JM, Fournier-Viger P, Ventura S (2019) Frequent itemset mining: a 25 years review. WIREs Data Mining and Knowledge Discovery 9(6):1329. https://doi.org/10.1002/wdm.1329. https://wires.onlinelibrary.wiley.com/doi/pdf/10.1002/widm.1329
Goyal P, Challa JS, Shrivastava S, Goyal N (2020) Anytime frequent itemset mining of transactional data streams. Big Data Research 21:100146. https://doi.org/10.1016/j.bdr.2020.100146
Xun Y, Cui X, Zhang J, Yin Q (2021) Incremental frequent itemsets mining based on frequent pattern tree and multi-scale. Expert Sys Appl 163:113805. https://doi.org/10.1016/j.eswa.2020.113805
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: International conference on very large data bases
Fournier-Viger P, Chun-Wei Lin J, Truong-Chi T, Nkambou R (2019) A Survey of High Utility Itemset Mining. In: Fournier-viger P, Lin JC-W, Nkambou R, Vo B, Tseng V.S. (eds) Springer, Cham, pp 1–45
Karagoz P, Cekinel RF (2019) High-utility pattern mining: theory, algorithms and applications. In: Studies in big data, 2019
Han X, Liu X, Li J, Gao H (2020) Efficient top-k high utility itemset mining on massive data. Inf Sci 557:382–406. https://doi.org/10.1016/j.ins.08.028
Gan W, Lin J C-W, Fournier-Viger P, Chao H.-C, Tseng VS, Yu PS (2021) A survey of utility-oriented pattern mining. IEEE Trans Knowl Data Eng 33(4):1306–1327. https://doi.org/10.1109/TKDE.2019.2942594
Amaranatha Reddy P, Hazarath Murali Krishna Prasad M (2021) High utility item-set mining from retail market data stream with various discount strategies using egui-tree. J Ambient Intell Human Comput, https://doi.org/10.1007/s12652-021-03341-3
Krishna GJ, Ravi V (2021) High utility itemset mining using binary differential evolution: An application to customer segmentation. Expert Sys Appl 181:115122. https://doi.org/10.1016/j.eswa.2021.115122
Kannimuthu S, Chakravarthy DG (2022) Discovery of interesting itemsets for web service composition using hybrid genetic algorithm. Neural Process Let, https://doi.org/10.1007/s11063-022-10793-x
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Zhang C, Du Z, Gan W, Yu PS (2021) Tkus: Mining top-k high utility sequential patterns. Inf Sci 570:342–359. https://doi.org/10.1016/j.ins.2021.04.035
Kim H, Yun U, Baek Y, Kim J, Vo B, Yoon E, Fujita H (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci 543:85–105. https://doi.org/10.1016/j.ins.2020.07.043
Lin JC-W, Djenouri Y, Srivastava G, Yun U, Fournier-Viger P (2021) A predictive ga-based model for closed high-utility itemset mining. Appl Soft Comput 108:107422. https://doi.org/10.1016/j.asoc.2021.107422
Singh K, Singh SS, Kumar A, Shakya HK, Biswas B (2018) Chn: an efficient algorithm for mining closed high utility itemsets with negative utility. IEEE Trans Knowl Data Eng:1–1 (ealy access). https://doi.org/10.1109/TKDE.2018.2882421
Nam H, Yun U, Yoon E, Chun- Wei Lin J (2020) Efficient approach of recent high utility stream pattern mining with indexed list structure and pruning strategy considering arrival times of transactions. Inf Sci 529:1–27. https://doi.org/10.1016/j.ins.2020.03.030
Singh K, Singh SS, Kumar A, Biswas B (2019) Tkeh: an efficient algorithm for mining top-k high utility itemsets. Appl Intell 49(3):1078–1097. https://doi.org/10.1007/s10489-018-1316-x
Song W, Zheng C, Huang C, Liu L (2021) Heuristically mining the top-k high-utility itemsets with cross-entropy optimization. Appl Intell, https://doi.org/10.1007/s10489-021-02576-z
Dam TL, Li K, Fournier-Viger P, Duong QH (2017) An efficient algorithm for mining top- k on-shelf high utility itemsets. Knowl Inf Syst 52(3):1–35
Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47(4):1240–1255. https://doi.org/10.1007/s10489-017-0939-7
Fournier-Viger P, Zhang Y, Chun-Wei Lin J, Fujita H, Koh YS (2019) Mining local and peak high utility itemsets. Inf Sci 481:344–367
Truong T, Duong H, Le B, Fournier-Viger P (2020) Ehausm: an efficient algorithm for high average utility sequence mining. Inf Sci 515:302–323
Singh K, Kumar R, Biswas B (2022) High average-utility itemsets mining: a survey. Appl Intell 52(4):3901–3938. https://doi.org/10.1007/s10489-021-02611-z
Fournier-Viger P, Li Z, Lin JC-W, Kiran RU, Fujita H (2019) Efficient algorithms to identify periodic patterns in multiple sequences. Inf Sci 489:205–226
Ashraf M, Abdelkader T, Rady S, Gharib TF (2022) Tkn: an efficient approach for discovering top-k high utility itemsets with positive or negative profits. Inf Sci 587:654–678. https://doi.org/10.1016/j.ins.2021.12.024
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12, pp 55–64. Association for computing machinery, New York, https://doi.org/10.1145/2396761.2396773
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Fournier-Viger P, Wu C-W, Zida S, Tseng VS (2014) Fhm: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: International symposium on methodologies for intelligent systems, pp 83–92
Duong Q-H, Fournier-Viger P, Ramampiaro H, Nørvåg K, Dam T-L (2018) Efficient high utility itemset mining using buffered utility-lists. Appl Intell 48(7):1859–1877. https://doi.org/10.1007/s10489-017-1057-2
Srikumar K (2017) Hminer: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183
Aryabarzan N, Minaei-Bidgoli B, Teshnehlab M (2018) negfin: an efficient algorithm for fast mining frequent itemsets. Expert Syst Appl 105:129–143
Liu Y, Liao W, Alok C (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Pacific-asia conference on advances in knowledge discovery & data mining
Zida S, Fournier-Viger P, Lin JC-W, Wu C-W, Tseng VS (2015) Efim: a highly efficient algorithm for high-utility itemset mining. Adv Artif Intell Soft Comput 9413:530–546
Lin JC-W, Gan W, Fournier-Viger P, Hong T-P, Zhan J (2016) Efficient mining of high-utility itemsets using multiple minimum utility thresholds. Knowl-Based Syst 113:100–115. https://doi.org/10.1016/j.knosys.2016.09.013
Peng A, Koh YS, Riddle P (2017) mhuiminer: a fast high utility itemset mining algorithm for sparse datasets:196–207, https://doi.org/10.1007/978-3-319-57529-2_16
Vuong N, Le B, Truong T, Nguyen D-P (2021) Efficient algorithms for discovering high-utility patterns with strong frequency affinities. Expert Syst Appl 169:114464. https://doi.org/10.1016/j.eswa.2020.114464
Dawar S, Goyal V, Bera D (2017) A hybrid framework for mining high-utility itemsets in a sparse transaction database. Appl Intell 47(3):809–827
Hong G, Hong T-P, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Infn Syst 38(1):85–107
Acknowledgements
This work was supported in part by the National Key R&D Program of China under Grant 2017YFC1601800 and 2017YFC1601000, in part by the National Natural Science foundation of China under Grant 62073155, 62002137, 62106088, and 61673194, in part by “Blue Project” in Jiangsu Universities, China, in part by Guangdong Provincial Key Laboratory under Grant 2020B121201001, in part by Advanced Research Project of Specialty Leading Person in Higher Vocational Colleges in Jiangsu Province.
Funding
This work was supported in part by the National Key R&D Program of China under Grant 2017YFC1601000 and 2017YFC1601800, in part by the National Natural Science foundation of China, under Grant 62073155, 62106088, 61673194, and 61672263.
Author information
Authors and Affiliations
Contributions
Zaihe Cheng: Methodology. Wei Fang: Supervision. Wei Shen: Software. Writing- Original draft preparation. Jerry Chun-Wei Lin, Bo Yuan: Resources, English language.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This work does not contain any studies with human participants performed by any of the authors.
Consent for Publication
Informed consent was obtained from all individual participants included in this work.
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cheng, Z., Fang, W., Shen, W. et al. An efficient utility-list based high-utility itemset mining algorithm. Appl Intell 53, 6992–7006 (2023). https://doi.org/10.1007/s10489-022-03850-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03850-4