Abstract
Cross-Level High-Utility Itemsets Mining (CLHUIM) aims to discover interesting relationships between hierarchy levels by introducing the taxonomy of items. To tackle this issue of the current CLHUIM algorithms encountering a challenge in dealing with large search spaces, researchers have proposed the concept of mining Top-K cross-level high-utility itemsets(CLHUIs). However, the results obtained by these methods often contain redundant itemsets with significant differences in hierarchy levels, and a large proportion of itemsets with higher abstraction levels, making it neglect some detailed information and unable to provide information of itemsets within the specified hierarchy range. Additionally, they are unable to handle dynamic transactional data. To address the aforementioned problems, this paper proposes Top-K Constrained Cross-Level High-Utility Itemsets Mining (TKCCLHM) algorithm to efficiently mine Top-K itemsets across different hierarchy levels over data streams. Firstly, a new hierarchical level concept is introduced to control the abstraction level of the introduced items, and Top-K itemsets are mined within a specific hierarchy range based on this concept. Secondly, a sliding window-based data structure called Sliding Window-based Utility Projection List (SUPL) is designed, which combined with transaction projection techniques to mine CLHUIs efficiently. Lastly, a Batch and Utility Hash Table (BUHT) structure capable of storing batch and (generalized) item utility information is proposed, along with a new threshold raising strategy. Extensive experiments on six datasets with taxonomy information demonstrated that the proposed algorithm exhibited significant improvements in runtime and scalability performance compared to the state-of-the-art algorithms.
Similar content being viewed by others
References
Han M, Zhang N, Wang L, Li XJ, Cheng HD (2023) Mining closed high utility patterns with negative utility in dynamic databases. Appl Intell 53(10):11750–11767
Lin JC-W, Djenouri Y, Srivastava G, Yun U, Fournier-Viger P (2021) A predictive GA-based model for closed high-utility itemset mining. Appl Soft Comput 108:107422
Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47(4):1240–1255
Srikant R, Agrawal R (1997) Mining generalized association rules. Futur Gener Comput Syst 13(2–3):161–180
Hipp J, Myka A, Wirth R, Güntzer U (2016) A new algorithm for faster mining of generalized association rules. Proceedings of the Principles of Data Mining and Knowledge Discovery: Second European Symposium, PKDD’98 Nantes. Springer, Berlin and Heidelberg, Berlin, pp. 74–82
Sriphaew K, Theeramunkong T (2002) A new method for finding generalized frequent itemsets in generalized association rule mining. In: Proceedings of the ISCC 2002 seventh international symposium on computers and communications. CA: IEEE Computer Society, Los Alamitos, pp. 1040–1045
Zhong M, Jiang T, Hong Y, Yang XH (2019) Performance of multi-level association rule mining for the relationship between causal factor patterns and flash flood magnitudes in a humid area. Geomat Nat Haz Risk 10(1):1967–1987
Baralis E, Cagliero L, Cerquitelli T, Garza P (2012) Generalized association rule mining with constraints. Inf Sci 194:68–84
Cagliero L, Chiusano S, Garza P, Ricupero G (2017). Discovering high-utility itemsets at multiple abstraction levels. In: Proceedings of the European conference on advances in databases and information systems. Switzerland: Springer, Cham, pp. 224–234
Fournier-Viger P, Wang Y, Lin JC-W, Luna JM, Ventura S (2020) Mining cross-level high utility itemsets. In: Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Switzerland: Springer, Cham, pp. 858–871
Tung NT, Nguyen LTT, Nguyen TDD, Fournier-Viger P, Nguyen N-T, Vo B (2022) Efficient mining of cross-level high-utility itemsets in taxonomy quantitative databases. Inf Sci 587:41–62
Nouioua M, Wang Y, Fournier-Viger P, Lin JC-W, Wu JM-T (2021) Tkc: mining top-k cross-level high utility itemsets. In: Proceedings of the 2020 international conference on data mining workshops. New York, IEEE, pp. 673–682
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management. Maui, HI, USA pp. 55–64
Fournier-Viger P, Wu C W, Zida S, Zida S, Tseng VS (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Proceedings of the International symposium on methodologies for intelligent systems. Roskilde, Denmark, pp. 83–92
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Zida S, Fournier-Viger P, Lin JC-W, Wu CW, Tseng VS (2017) EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625
Peng A Y, Koh Y S, Riddle P (2017) mHUIMiner: a fast high utility itemset mining algorithm for sparse datasets. In: Proceedings of the advances in knowledge discovery and data mining: 21st pacific-asia conference. Jeju, South Korea pp. 196–207
Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183
Jiang H, Li X, Wang HJ, Wei JH (2022) Cross-level high utility itemset mining algorithms based on data index structure. J Comput Appl 43(7):2220
Tung N, Nguyen LT, Nguyen TD, Kozierkiewicz A (2021) Cross-level high-utility itemset mining using multi-core processing. In: Proceedings of the International Conference on Computational Collective Intelligence pp. 467–479
Wang Y (2021) Algorithms for cross-level high utility itemset mining. Herbin Institute of Technology
Wu CW, Shie B-E, Yu PS, Tseng VS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 78–86
Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126
Tseng VS, Wu C-W, Fournier-Viger P, Yu PS (2015) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Duong Q-H, Liao B, Fournier-Viger P, Dam TL (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122
Singh K, Singh SS, Kumar A, Biswas B (2019) TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell 49:1078–1097
Krishnamoorthy S (2019) Mining top-k high utility itemsets with effective threshold raising strategies. Expert Syst Appl 117:148–165
Sun R, Han M, Zhang CY, Shen MY, Du SY (2021) Mining of top-k high utility itemsets with negative utility. J Intell Fuzzy Syst 40(3):5637–5652
Ashraf M, Abdelkader T, Rady S, Gharib TF (2022) TKN: an efficient approach for discovering top-k high utility itemsets with positive or negative profits. Inf Sci 587:654–678
Wu R, He Z (2018) Top-k high average-utility itemsets mining with effective pruning strategies. Appl Intell 48(10):3429–3445
AHMED C F, TANBEER S K, Jeong B S (2010) Efficient mining of high utility patterns over data streams with a sliding window method. In: Software engineering, artificial intelligence, networking and parallel/distributed computing. Springer, Berlin and Heidelberg, Berlin, pp. 99–113
Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231
Baek Y, Yun U, Kim H, Nam H, Kim H, Lin JC-W, Vo B, Pedrycz W (2021) Rhups: mining recent high utility patterns with sliding window–based arrival time control over data streams. ACM Trans Intell Syst Technol (TIST) 12(2):1–27
Jaysawal BP, Huang J-W (2020) SOHUPDS: a single-pass one-phase algorithm for mining high utility patterns over a data stream. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing pp. 490–497
Cheng H, Han M, Zhang N, Wang L, Li XJ (2021) ETKDS: an efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model. J Intell Fuzzy Syst 41(2):3317–3338
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205
Acknowledgements
This work was supported by the National Nature Science Foundation of China (62062004) and the Ningxia Natural Science Foundation Project (2023AAC03315).
Author information
Authors and Affiliations
Contributions
MH contributed to writing—review & editing, supervision, funding acquisition, and resources. SL contributed to conceptualization, methodology, software, and writing—original draft. ZG contributed to data curation, investigation, and visualization. DM contributed to formal analysis and project administration. AL contributed to validation.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Han, M., Liu, S., Gao, Z. et al. Mining Top-K constrained cross-level high-utility itemsets over data streams. Knowl Inf Syst 66, 2885–2924 (2024). https://doi.org/10.1007/s10115-023-02045-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-02045-8