Skip to main content
Log in

Mining Top-K constrained cross-level high-utility itemsets over data streams

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Cross-Level High-Utility Itemsets Mining (CLHUIM) aims to discover interesting relationships between hierarchy levels by introducing the taxonomy of items. To tackle this issue of the current CLHUIM algorithms encountering a challenge in dealing with large search spaces, researchers have proposed the concept of mining Top-K cross-level high-utility itemsets(CLHUIs). However, the results obtained by these methods often contain redundant itemsets with significant differences in hierarchy levels, and a large proportion of itemsets with higher abstraction levels, making it neglect some detailed information and unable to provide information of itemsets within the specified hierarchy range. Additionally, they are unable to handle dynamic transactional data. To address the aforementioned problems, this paper proposes Top-K Constrained Cross-Level High-Utility Itemsets Mining (TKCCLHM) algorithm to efficiently mine Top-K itemsets across different hierarchy levels over data streams. Firstly, a new hierarchical level concept is introduced to control the abstraction level of the introduced items, and Top-K itemsets are mined within a specific hierarchy range based on this concept. Secondly, a sliding window-based data structure called Sliding Window-based Utility Projection List (SUPL) is designed, which combined with transaction projection techniques to mine CLHUIs efficiently. Lastly, a Batch and Utility Hash Table (BUHT) structure capable of storing batch and (generalized) item utility information is proposed, along with a new threshold raising strategy. Extensive experiments on six datasets with taxonomy information demonstrated that the proposed algorithm exhibited significant improvements in runtime and scalability performance compared to the state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Algorithm 2
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Han M, Zhang N, Wang L, Li XJ, Cheng HD (2023) Mining closed high utility patterns with negative utility in dynamic databases. Appl Intell 53(10):11750–11767

    Article  Google Scholar 

  2. Lin JC-W, Djenouri Y, Srivastava G, Yun U, Fournier-Viger P (2021) A predictive GA-based model for closed high-utility itemset mining. Appl Soft Comput 108:107422

    Article  Google Scholar 

  3. Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47(4):1240–1255

    Article  Google Scholar 

  4. Srikant R, Agrawal R (1997) Mining generalized association rules. Futur Gener Comput Syst 13(2–3):161–180

    Article  Google Scholar 

  5. Hipp J, Myka A, Wirth R, Güntzer U (2016) A new algorithm for faster mining of generalized association rules. Proceedings of the Principles of Data Mining and Knowledge Discovery: Second European Symposium, PKDD’98 Nantes. Springer, Berlin and Heidelberg, Berlin, pp. 74–82

  6. Sriphaew K, Theeramunkong T (2002) A new method for finding generalized frequent itemsets in generalized association rule mining. In: Proceedings of the ISCC 2002 seventh international symposium on computers and communications. CA: IEEE Computer Society, Los Alamitos, pp. 1040–1045

  7. Zhong M, Jiang T, Hong Y, Yang XH (2019) Performance of multi-level association rule mining for the relationship between causal factor patterns and flash flood magnitudes in a humid area. Geomat Nat Haz Risk 10(1):1967–1987

    Article  Google Scholar 

  8. Baralis E, Cagliero L, Cerquitelli T, Garza P (2012) Generalized association rule mining with constraints. Inf Sci 194:68–84

    Article  Google Scholar 

  9. Cagliero L, Chiusano S, Garza P, Ricupero G (2017). Discovering high-utility itemsets at multiple abstraction levels. In: Proceedings of the European conference on advances in databases and information systems. Switzerland: Springer, Cham, pp. 224–234

  10. Fournier-Viger P, Wang Y, Lin JC-W, Luna JM, Ventura S (2020) Mining cross-level high utility itemsets. In: Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Switzerland: Springer, Cham, pp. 858–871

  11. Tung NT, Nguyen LTT, Nguyen TDD, Fournier-Viger P, Nguyen N-T, Vo B (2022) Efficient mining of cross-level high-utility itemsets in taxonomy quantitative databases. Inf Sci 587:41–62

    Article  Google Scholar 

  12. Nouioua M, Wang Y, Fournier-Viger P, Lin JC-W, Wu JM-T (2021) Tkc: mining top-k cross-level high utility itemsets. In: Proceedings of the 2020 international conference on data mining workshops. New York, IEEE, pp. 673–682

  13. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management. Maui, HI, USA pp. 55–64

  14. Fournier-Viger P, Wu C W, Zida S, Zida S, Tseng VS (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Proceedings of the International symposium on methodologies for intelligent systems. Roskilde, Denmark, pp. 83–92

  15. Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381

    Article  Google Scholar 

  16. Zida S, Fournier-Viger P, Lin JC-W, Wu CW, Tseng VS (2017) EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625

    Article  Google Scholar 

  17. Peng A Y, Koh Y S, Riddle P (2017) mHUIMiner: a fast high utility itemset mining algorithm for sparse datasets. In: Proceedings of the advances in knowledge discovery and data mining: 21st pacific-asia conference. Jeju, South Korea pp. 196–207

  18. Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183

    Article  Google Scholar 

  19. Jiang H, Li X, Wang HJ, Wei JH (2022) Cross-level high utility itemset mining algorithms based on data index structure. J Comput Appl 43(7):2220

    Google Scholar 

  20. Tung N, Nguyen LT, Nguyen TD, Kozierkiewicz A (2021) Cross-level high-utility itemset mining using multi-core processing. In: Proceedings of the International Conference on Computational Collective Intelligence pp. 467–479

  21. Wang Y (2021) Algorithms for cross-level high utility itemset mining. Herbin Institute of Technology

  22. Wu CW, Shie B-E, Yu PS, Tseng VS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 78–86

  23. Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126

    Article  Google Scholar 

  24. Tseng VS, Wu C-W, Fournier-Viger P, Yu PS (2015) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67

    Article  Google Scholar 

  25. Duong Q-H, Liao B, Fournier-Viger P, Dam TL (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122

    Article  Google Scholar 

  26. Singh K, Singh SS, Kumar A, Biswas B (2019) TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell 49:1078–1097

    Article  Google Scholar 

  27. Krishnamoorthy S (2019) Mining top-k high utility itemsets with effective threshold raising strategies. Expert Syst Appl 117:148–165

    Article  Google Scholar 

  28. Sun R, Han M, Zhang CY, Shen MY, Du SY (2021) Mining of top-k high utility itemsets with negative utility. J Intell Fuzzy Syst 40(3):5637–5652

    Article  Google Scholar 

  29. Ashraf M, Abdelkader T, Rady S, Gharib TF (2022) TKN: an efficient approach for discovering top-k high utility itemsets with positive or negative profits. Inf Sci 587:654–678

    Article  Google Scholar 

  30. Wu R, He Z (2018) Top-k high average-utility itemsets mining with effective pruning strategies. Appl Intell 48(10):3429–3445

    Article  Google Scholar 

  31. AHMED C F, TANBEER S K, Jeong B S (2010) Efficient mining of high utility patterns over data streams with a sliding window method. In: Software engineering, artificial intelligence, networking and parallel/distributed computing. Springer, Berlin and Heidelberg, Berlin, pp. 99–113

  32. Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231

    Article  Google Scholar 

  33. Baek Y, Yun U, Kim H, Nam H, Kim H, Lin JC-W, Vo B, Pedrycz W (2021) Rhups: mining recent high utility patterns with sliding window–based arrival time control over data streams. ACM Trans Intell Syst Technol (TIST) 12(2):1–27

    Article  Google Scholar 

  34. Jaysawal BP, Huang J-W (2020) SOHUPDS: a single-pass one-phase algorithm for mining high utility patterns over a data stream. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing pp. 490–497

  35. Cheng H, Han M, Zhang N, Wang L, Li XJ (2021) ETKDS: an efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model. J Intell Fuzzy Syst 41(2):3317–3338

    Article  Google Scholar 

  36. Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Nature Science Foundation of China (62062004) and the Ningxia Natural Science Foundation Project (2023AAC03315).

Author information

Authors and Affiliations

Authors

Contributions

MH contributed to writing—review & editing, supervision, funding acquisition, and resources. SL contributed to conceptualization, methodology, software, and writing—original draft. ZG contributed to data curation, investigation, and visualization. DM contributed to formal analysis and project administration. AL contributed to validation.

Corresponding author

Correspondence to Meng Han.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, M., Liu, S., Gao, Z. et al. Mining Top-K constrained cross-level high-utility itemsets over data streams. Knowl Inf Syst 66, 2885–2924 (2024). https://doi.org/10.1007/s10115-023-02045-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-02045-8

Keywords

Navigation