Skip to main content
Log in

Mining top-k high average-utility itemsets based on breadth-first search

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

High average-utility itemset mining is a subfield of data mining that has extensive practical applications. However, it is difficult for users to determine a proper minimum threshold because they cannot accurately predict the number of patterns mined at a given threshold. To address this issue, top-k high average-utility itemset mining has been proposed where k is the number of high average-utility itemsets to be mined. In this paper, we design an effective algorithm (named ETAUIM) for finding top-k high average-utility itemsets. ETAUIM employs a breadth-first search strategy to efficiently explore the search space, and it utilizes a tighter upper bound instead of the average-utility upper bound to limit the search space. Additionally, ETAUIM removes irrelevant items during the mining process and utilizes an early abandoning strategy to terminate unnecessary join operations in advance. To evaluate the proposed algorithm, extensive experiments were conducted on six sparse datasets and two dense datasets. Four state-of-the-art algorithms were used for comparison. The experimental results show that ETAUIM has excellent performance and scalability. Moreover, ETAUIM always performs better for sparse datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The data utilized in this study are available from the SPMF Open-Source Data Mining Library.

References

  1. Liu H, Liu T, Chen Y et al (2022) EHPE: Skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans Multimedia 1:12

    Google Scholar 

  2. Liu T, Wang J, Yang B et al (2021) NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220

    Google Scholar 

  3. Liu H, Fang S, Zhang Z et al (2021) MFDNet: Collaborative poses perception and matrix Fisher distribution for head pose estimation. IEEE Trans Multimedia 24:2449–2460

    Google Scholar 

  4. Liu H, Zhang C, Deng Y et al (2023) TransIFC: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. IEEE Trans Multimedia 1:14

    Google Scholar 

  5. Liu T, Liu H, Yang B et al (2023) LDCNet: Limb Direction Cues-aware Network for Flexible Human Pose Estimation in Industrial Behavioral Biometrics Systems. IEEE Trans Ind Inf 1:11

    Google Scholar 

  6. Liu H, Liu T, Zhang Z et al (2022) Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human–computer interaction. IEEE Trans Industr Inf 18(10):7107–7117

    Google Scholar 

  7. Luna JM, Fournier-Viger P, Sebastián V (2019) Frequent itemset mining: A 25 years review. Wiley Interdiscip Rev: Data Mining and Knowledge Discovery 9(6):e1329

    Google Scholar 

  8. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Data Bases 1215:487–499

  9. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390

    Google Scholar 

  10. Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. Proceedings of the 21th International Conference on Very Large Data Bases 432–444.

  11. Han JW, Pei J, Yin YW et al (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87

    MathSciNet  Google Scholar 

  12. Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations, 126:1–11

  13. Grahne G, Zhu JF (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362

    Google Scholar 

  14. Tseng VS, Shie B-E, Wu C-W et al (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786

    Google Scholar 

  15. Lan G-C, Hong T-P, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107

    Google Scholar 

  16. Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381

    Google Scholar 

  17. Liu J, Wang K, Fung BCM (2016) Mining high utility patterns in one phase without generating candidates. IEEE Trans Knowl Data Eng 28(5):1245–1257

    Google Scholar 

  18. Krishnamoorthy S (2017) HMiner: Efficiently mining high utility itemsets. Expert Syst Appl 90:168–183

    Google Scholar 

  19. Peng AY, Koh YS, Riddle P (2017) mHUIMiner: A fast high utility itemset mining algorithm for sparse datasets. Proceedings of the 21st Pacific-Asia Conference on Knowledge Discovery and Data Mining 196–207

  20. Nawaz MS, Fournier-Viger P, Yun U et al (2022) Mining high utility itemsets with Hill climbing and simulated annealing. ACM Trans Manag Inf Syst 13(1):1–22

    Google Scholar 

  21. Gan W, Lin JC-W, Fournier-Viger P et al (2021) A survey of utility-oriented pattern mining. IEEE Trans Knowl Data Eng 33(4):1306–1327

    Google Scholar 

  22. Choi H-J, Park CH (2019) Emerging topic detection in twitter stream based on high utility pattern mining. Expert Syst Appl 115:27–36

    Google Scholar 

  23. Vu HQ, Li G, Law R (2020) Discovering highly profitable travel patterns by high-utility pattern mining. Tour Manage 77:104008

    Google Scholar 

  24. Singh K, Kumar R, Biswas B (2022) High average-utility itemsets mining: a survey. Appl Intell 52(4):3901–3938

    Google Scholar 

  25. Hong T-P, Lee C-H, Wang S-L (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265

    Google Scholar 

  26. Lan G-C, Hong T-P, Tseng VS (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28(1):193–209

    Google Scholar 

  27. Lan G-C, Hong T-P, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Mak 11(05):1009–1030

    Google Scholar 

  28. Lin C-W, Hong T-P, Lu W-H (2010) Efficiently mining high average utility itemsets with a tree structure. asian conference on intelligent information and database systems 131–139

  29. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Rec 29(2):1–12

    Google Scholar 

  30. Yildirim I, Celik M (2019) An Efficient Tree-Based Algorithm for Mining High Average-Utility Itemset. IEEE Access 7:144245–144263

    Google Scholar 

  31. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. Proceedings of the 21st ACM International Conference on Information and Knowledge Management 55–64

  32. Lin JC-W, Li T, Fournier-Viger P et al (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243

    Google Scholar 

  33. Lin JC-W, Ren S, Fournier-Viger P et al (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346

    Google Scholar 

  34. Lin JC-W, Ren S, Fournier-Viger P et al (2017) EHAUPM: Efficient high average-utility pattern mining with tighter upper bounds. IEEE Access 5:12927–12940

    Google Scholar 

  35. Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Futur Gener Comput Syst 68:346–360

    Google Scholar 

  36. Sethi KK, Ramesh D (2020) A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure. J Supercomput 76(12):10288–10318

    Google Scholar 

  37. Kim H, Yun U, Baek Y et al (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci 543:85–105

    Google Scholar 

  38. Song W, Liu L, Huang C (2021) Generalized maximal utility for mining high average-utility itemsets. Knowl Inf Syst 63(11):2947–2967

    Google Scholar 

  39. Li G, Shang T, Zhang Y (2023) Efficient mining high average-utility itemsets with effective pruning strategies and novel list structure. Appl Intell 53(5):6099–6118

    Google Scholar 

  40. Wu CW, Shie B-E, Tseng VS et al. (2012) Mining top-k high utility itemsets. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining 78–86

  41. Tseng VS, Wu CW, Fournier Viger P et al (2016) Efficient algorithms for mining Top-K high htility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67

    Google Scholar 

  42. Tseng VS, Wu CW, Shie BE et al. (2010) UP-Growth: An efficient algorithm for high utility itemset mining. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 253–262

  43. Duong Q-H, Liao B, Fournier-Viger P et al (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122

    Google Scholar 

  44. Singh K, Singh SS, Kumar A et al (2019) TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell 49:1078–1097

    Google Scholar 

  45. Zida S, Fournier-Viger P, Lin JC-W et al (2017) EFIM: A fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625

    Google Scholar 

  46. Krishnamoorthy S (2019) Mining top-k high utility itemsets with effective threshold raising strategies. Expert Syst Appl 117:148–165

    Google Scholar 

  47. Luna JM, Kiran RU, Fournier-Viger P et al (2023) Efficient mining of top-k high utility itemsets through genetic algorithms. Inf Sci 624:529–553

    Google Scholar 

  48. Gan W, Wan S, Chen J et al (2020) TopHUI: Top-k high-utility itemset mining with negative utility. IEEE Int Conf Big Data (Big Data) 2020:5350–5359

    Google Scholar 

  49. Sun R, Han M, Zhang C et al (2021) Mining of top-k high utility itemsets with negative utility. J Intell Fuzzy Syst 40(3):5637–5652

    Google Scholar 

  50. Sun R, Han M, Zhang C et al (2021) Algorithm for mining top-k high utility itemsets with negative items. J Comp App 41(8):2386

    Google Scholar 

  51. Ashraf M, Abdelkader T, Rady S et al (2022) TKN: An efficient approach for discovering top-k high utility itemsets with positive or negative profits. Inf Sci 587:654–678

    Google Scholar 

  52. Zihayat M, An A (2014) Mining top-k high utility patterns over data streams. Inf Sci 285:138–161

    MathSciNet  MATH  Google Scholar 

  53. Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47:1240–1255

    Google Scholar 

  54. Cheng H, Han M, Zhang N et al (2021) ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model. J Intell Fuzzy Syst 41(2):3317–3338

    Google Scholar 

  55. Wu R, He Z (2018) Top-k high average-utility itemsets mining with effective pruning strategies. Appl Intell 48(10):3429–3445

    Google Scholar 

  56. Liu X, Chen G, Zuo W (2022) Effective algorithms to mine skyline frequent-utility itemsets. Eng Appl Artif Intell 116:105355

    Google Scholar 

  57. Fournier-Viger P, Lin J C W, Gomariz A, et al. (2016) The SPMF open-source data mining library version 2. Proceedings of 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, 36–40

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of Zhejiang Province (LQ21F030010); Natural Science Foundation of Ningbo (202003N4306); the Public Welfare Foundation of Ningbo (2021S108); the Key Technology R&D Program of Ningbo (2022Z149); Ningbo Science and Technology Special Innovation Projects (2021Z079, 2022Z235).

Author information

Authors and Affiliations

Authors

Contributions

Xuan Liu: Conceptualization, Methodology, Writing—original draft. Genlang Chen: Supervision, Project administration. Fangyu Wu: Data curation, Software. Shiting Wen: Validation, Writing—Review & Editing. Wanli Zuo: Writing—Review & Editing.

Corresponding author

Correspondence to Fangyu Wu.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests related to this research.

Ethical and informed consent for data used

The data used in this study were obtained through publicly available sources, and no ethical or informed consent considerations were required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Chen, G., Wu, F. et al. Mining top-k high average-utility itemsets based on breadth-first search. Appl Intell 53, 29319–29337 (2023). https://doi.org/10.1007/s10489-023-05076-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05076-4

Keywords

Navigation