Skip to main content
Log in

Heuristically mining the top-k high-utility itemsets with cross-entropy optimization

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Mining high-utility itemsets (HUIs) is one of the most important research topics in data mining because HUIs consider non-binary frequency values of items in transactions and different profit values for each item. However, setting appropriate minimum utility thresholds by trial and error is a tedious process for users. Thus, mining the top-k HUIs without setting a utility threshold is becoming an alternative to determine all the HUIs. In this paper, we propose two algorithms, called the top-k high-utility itemset mining based on cross-entropy method (TKU-CE) and TKU-CE+, for mining the top-k HUIs heuristically. The TKU-CE algorithm is based on cross-entropy, and implements top-k HUI mining using combinatorial optimization. The main idea of TKU-CE is to generate the top-k HUIs by gradually updating the probabilities of itemsets with high-utility values. TKU-CE+ optimizes TKU-CE in three respects. First, unpromising items are filtered by critical utility value, to reduce the computational burden in the initial stage. Second, a sample refinement strategy is used in each iteration, to reduce the computational burden in the iterative stage. Finally, smoothing mutation is proposed, to randomly generate some new itemsets in addition to those from previous iterations. Consequently, diversity of samples is improved, so that more actual top-k HUIs can be discovered with fewer iterations. Compared with state-of-the-art algorithms, TKU-CE and TKU-CE+ are easy to implement and avoid the computational costs that would be incurred by additional data structures and threshold-raising strategies. Extensive experimental results show that both algorithms are efficient, memory-saving, scalable, and can discover the most actual top-k HUIs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Bao R, Yuan X, Chen Z, Ma R (2018) Cross-entropy pruning for compressing convolutional neural networks. Neural Comput 30(11):3128–3149

    Article  MathSciNet  Google Scholar 

  2. de Boer P-T, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Annals OR 134(1):19–67

    Article  MathSciNet  MATH  Google Scholar 

  3. Dam T-L, Li K, Fournier-Viger P, Duong Q-H (2016) An efficient algorithm for mining top-rank-k frequent patterns. Appl Intell 45(1):96–111

    Article  Google Scholar 

  4. Dawar S, Goyal V, Bera D (2017) A hybrid framework for mining high-utility itemsets in a sparse transaction database. Appl Intell 47(3):809–827

    Article  Google Scholar 

  5. Deng Z-H (2018) An efficient structure for fast mining high utility itemsets. Appl Intell 48(9):3161–3177

    Article  Google Scholar 

  6. Djenouri Y, Comuzzi M (2017) Combining Apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem. Inform Sciences 420:1–15

    Article  Google Scholar 

  7. Duong Q-H, Liao B, Fournier-Viger P, Dam T-L (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122

    Article  Google Scholar 

  8. Fournier-Viger P, Li J, Lin J C-W, Chi T T, Kiran RU (2020) Mining cost-effective patterns in event logs Knowl-Based Syst 191

  9. Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: proceedings of the 19th European conference on machine learning and knowledge discovery in databases (PKDD’16), pp 36–40

  10. Fournier-Viger P, Zhang Y, Lin JC-W, Fujita H, Koh YS (2019) Mining local and peak high utility itemsets. Inform Sciences 481:344–367

    Article  MathSciNet  Google Scholar 

  11. Gunawan R, Winarko E, Pulungan R (2020) A BPSO-based method for high-utility itemset mining without minimum utility threshold Knowl-Based Syst:190

  12. Joseph AG, Bhatnagar S (2018) An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method. Mach Learn 107(8–10):1385–1429

    Article  MathSciNet  MATH  Google Scholar 

  13. Kannimuthu S, Premalatha K (2014) Discovery of high utility itemsets using genetic algorithm with ranked mutation. Appl Artif Intell 28(4):337–359

    Article  Google Scholar 

  14. Kim H, Yun U, Baek Y, Kim J, Vo B, Yoon E, Fujita H (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inform Sciences 543:85–105

    Article  Google Scholar 

  15. Krishna GJ, Ravi V (2020) Mining top high utility association rules using binary differential evolution. Eng Appl Artif Intell 96:103935

    Article  Google Scholar 

  16. Krishnamoorthy S (2019) Mining top-k high utility itemsets with effective threshold raising strategies. Expert Syst Appl 117:148–165

    Article  Google Scholar 

  17. Lee S, Park J S (2016) Top-k high utility itemset mining based on utility-list structures. In: proceedings of the 2016 international conference on big data and smart computing (BigComp’16), pp 101–108

  18. Li X, Yu L, Chang D, Ma Z, Cao J (2019) Dual cross-entropy loss for small-sample fine-grained vehicle classification. IEEE Trans Vehicular Technology 68(5):4204–4212

    Article  Google Scholar 

  19. Lin JC-W, Yang L, Fournier-Viger P, Hong T-P, Voznak M (2017) A binary PSO approach to mine high-utility itemsets. Soft Comput 21(17):5103–5121

    Article  Google Scholar 

  20. Lin JC-W, Yang L, Fournier-Viger P, Wu JM-T, Hong T-P, Wang S-L L, Zhan J (2016) Mining high-utility itemsets based on particle swarm optimization. Eng Appl Artif Intell 55:320–330

    Article  Google Scholar 

  21. Liu Y, Liao W-K, Choudhary A N (2005) A two phase algorithm for fast discovery of high utility of itemsets. In: proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining (PAKDD’05), pp 689–695

  22. Luna JM, Fournier-Viger P, Ventura S (2019) Frequent itemset mining: a 25 years review. Wiley Interdiscip Rev Data Min Knowl Discov 9(6)

  23. Nguyen LTT, Vu VV, Lam MTH, Duong TTM, Manh LT, Nguyen TTT, Vo B, Fujita H (2019) An efficient method for mining high utility closed itemsets. Inform Sciences 495:78–99

    Article  Google Scholar 

  24. Pazhaniraja N, Sountharrajan S, Kumar BS (2020) High utility itemset mining: a Boolean operators-based modified grey wolf optimization algorithm. Soft Comput 24(21):16691–16704

    Article  Google Scholar 

  25. Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126

    Article  Google Scholar 

  26. Song W, Huang C (2018) Discovering high utility itemsets based on the artificial bee colony algorithm. In: proceedings of the 22nd Pacific-Asia conference on knowledge discovery and data mining (PAKDD’18), pp 3–14

  27. Song W, Huang C (2018) Mining high utility itemsets using bio-inspired algorithms: a diverse optimal value framework. IEEE Access 6:19568–19582

    Article  Google Scholar 

  28. Song W, Huang C (2020) Mining high average-utility itemsets based on particle swarm optimization. Data Sci Pattern Recognit 4(2):19–32

    Google Scholar 

  29. Song W, Li J (2020) Discovering high utility itemsets using set-based particle swarm optimization. In: proceedings of the 16th international conference on advanced data mining and applications (ADMA’20), pp 38–53

  30. Song W, Liu L, Huang C (2020) TKU-CE: cross-entropy method for mining top-k high utility itemsets. In: proceedings of the 33rd international conference on industrial, engineering and other applications of applied intelligent systems (IEA/AIE’20), pp 846–857

  31. Song W, Zhang ZH, Li JH (2016) A high utility itemset mining algorithm based on subsume index. Knowl Inf Syst 49(1):315–340

    Article  Google Scholar 

  32. Truong T, Duong H, Le B, Fournier-Viger P, Yun U, Fujita H (2021) Efficient algorithms for mining frequent high utility sequences with constraints. Inform Sciences 568:239–264

    Article  MathSciNet  Google Scholar 

  33. Tseng VS, Wu C-W, Fournier-Viger P, Yu PS (2016) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67

    Article  Google Scholar 

  34. Vo B, Bui H, Vo T, Le T (2020) Mining top-rank-k frequent weighted itemsets using WN-list structures and an early pruning strategy. Knowl based Syst 201–202

  35. Vo B, Nguyen LTT, Nguyen TDD, Fournier-Viger P, Yun U (2020) A multi-core approach to efficiently mining high-utility itemsets in dynamic profit databases. IEEE Access 8:85890–85899

    Article  Google Scholar 

  36. Wu C-W, Shie B-E, Tseng V S, Yu P S (2012) Mining top-k high utility itemsets. In: proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’12), pp 78–86

  37. Wu JM-T, Srivastava G, Wei M, Yun U, Lin JC-W (2021) Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework. Inform Sciences 553:31–48

    Article  MathSciNet  Google Scholar 

  38. Wu JM-T, Zhan J, Lin JC-W (2017) An ACO-based approach to mine high-utility itemsets. Knowl-Based Syst 116:102–113

    Article  Google Scholar 

  39. Zhang Q, Fang W, Sun J, Wang Q (2019) Improved genetic algorithm for high-utility itemset mining. IEEE Access 7:176799–176813

    Article  Google Scholar 

Download references

Acknowledgments

This paper is a substantially extended version of our conference paper presented at IEA/AIE 2020. The authors would like to thank the anonymous reviewers for their valuable comments and suggestions, which helped to improve the quality of this paper. We would also like to thank Dr. Quang-Huy Duong for providing the source code of the kHMC algorithm. This work was supported by the National Natural Science Foundation of China (61977001), Great Wall Scholar Program (CIT&TCD20190305), and Beijing Urban Governance Research Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Song.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Emerging topics in Applied Intelligence selected from IEA/AIE2020

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, W., Zheng, C., Huang, C. et al. Heuristically mining the top-k high-utility itemsets with cross-entropy optimization. Appl Intell 52, 17026–17041 (2022). https://doi.org/10.1007/s10489-021-02576-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02576-z

Keywords

Navigation