Heuristically mining the top-k high-utility itemsets with cross-entropy optimization

Song, Wei; Zheng, Chuanlong; Huang, Chaomin; Liu, Lu

doi:10.1007/s10489-021-02576-z

Heuristically mining the top-k high-utility itemsets with cross-entropy optimization

Published: 29 July 2021

Volume 52, pages 17026–17041, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Wei Song ORCID: orcid.org/0000-0003-0649-8850¹,
Chuanlong Zheng¹,
Chaomin Huang¹ &
…
Lu Liu

455 Accesses
13 Citations
Explore all metrics

Abstract

Mining high-utility itemsets (HUIs) is one of the most important research topics in data mining because HUIs consider non-binary frequency values of items in transactions and different profit values for each item. However, setting appropriate minimum utility thresholds by trial and error is a tedious process for users. Thus, mining the top-k HUIs without setting a utility threshold is becoming an alternative to determine all the HUIs. In this paper, we propose two algorithms, called the top-k high-utility itemset mining based on cross-entropy method (TKU-CE) and TKU-CE+, for mining the top-k HUIs heuristically. The TKU-CE algorithm is based on cross-entropy, and implements top-k HUI mining using combinatorial optimization. The main idea of TKU-CE is to generate the top-k HUIs by gradually updating the probabilities of itemsets with high-utility values. TKU-CE+ optimizes TKU-CE in three respects. First, unpromising items are filtered by critical utility value, to reduce the computational burden in the initial stage. Second, a sample refinement strategy is used in each iteration, to reduce the computational burden in the iterative stage. Finally, smoothing mutation is proposed, to randomly generate some new itemsets in addition to those from previous iterations. Consequently, diversity of samples is improved, so that more actual top-k HUIs can be discovered with fewer iterations. Compared with state-of-the-art algorithms, TKU-CE and TKU-CE+ are easy to implement and avoid the computational costs that would be incurred by additional data structures and threshold-raising strategies. Extensive experimental results show that both algorithms are efficient, memory-saving, scalable, and can discover the most actual top-k HUIs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Article 19 January 2024

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Article Open access 22 February 2023

Data mining and machine learning in retail business: developing efficiencies for better customer retention

Article 02 January 2021

References

Bao R, Yuan X, Chen Z, Ma R (2018) Cross-entropy pruning for compressing convolutional neural networks. Neural Comput 30(11):3128–3149
Article MathSciNet Google Scholar
de Boer P-T, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Annals OR 134(1):19–67
Article MathSciNet MATH Google Scholar
Dam T-L, Li K, Fournier-Viger P, Duong Q-H (2016) An efficient algorithm for mining top-rank-k frequent patterns. Appl Intell 45(1):96–111
Article Google Scholar
Dawar S, Goyal V, Bera D (2017) A hybrid framework for mining high-utility itemsets in a sparse transaction database. Appl Intell 47(3):809–827
Article Google Scholar
Deng Z-H (2018) An efficient structure for fast mining high utility itemsets. Appl Intell 48(9):3161–3177
Article Google Scholar
Djenouri Y, Comuzzi M (2017) Combining Apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem. Inform Sciences 420:1–15
Article Google Scholar
Duong Q-H, Liao B, Fournier-Viger P, Dam T-L (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122
Article Google Scholar
Fournier-Viger P, Li J, Lin J C-W, Chi T T, Kiran RU (2020) Mining cost-effective patterns in event logs Knowl-Based Syst 191
Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: proceedings of the 19th European conference on machine learning and knowledge discovery in databases (PKDD’16), pp 36–40
Fournier-Viger P, Zhang Y, Lin JC-W, Fujita H, Koh YS (2019) Mining local and peak high utility itemsets. Inform Sciences 481:344–367
Article MathSciNet Google Scholar
Gunawan R, Winarko E, Pulungan R (2020) A BPSO-based method for high-utility itemset mining without minimum utility threshold Knowl-Based Syst:190
Joseph AG, Bhatnagar S (2018) An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method. Mach Learn 107(8–10):1385–1429
Article MathSciNet MATH Google Scholar
Kannimuthu S, Premalatha K (2014) Discovery of high utility itemsets using genetic algorithm with ranked mutation. Appl Artif Intell 28(4):337–359
Article Google Scholar
Kim H, Yun U, Baek Y, Kim J, Vo B, Yoon E, Fujita H (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inform Sciences 543:85–105
Article Google Scholar
Krishna GJ, Ravi V (2020) Mining top high utility association rules using binary differential evolution. Eng Appl Artif Intell 96:103935
Article Google Scholar
Krishnamoorthy S (2019) Mining top-k high utility itemsets with effective threshold raising strategies. Expert Syst Appl 117:148–165
Article Google Scholar
Lee S, Park J S (2016) Top-k high utility itemset mining based on utility-list structures. In: proceedings of the 2016 international conference on big data and smart computing (BigComp’16), pp 101–108
Li X, Yu L, Chang D, Ma Z, Cao J (2019) Dual cross-entropy loss for small-sample fine-grained vehicle classification. IEEE Trans Vehicular Technology 68(5):4204–4212
Article Google Scholar
Lin JC-W, Yang L, Fournier-Viger P, Hong T-P, Voznak M (2017) A binary PSO approach to mine high-utility itemsets. Soft Comput 21(17):5103–5121
Article Google Scholar
Lin JC-W, Yang L, Fournier-Viger P, Wu JM-T, Hong T-P, Wang S-L L, Zhan J (2016) Mining high-utility itemsets based on particle swarm optimization. Eng Appl Artif Intell 55:320–330
Article Google Scholar
Liu Y, Liao W-K, Choudhary A N (2005) A two phase algorithm for fast discovery of high utility of itemsets. In: proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining (PAKDD’05), pp 689–695
Luna JM, Fournier-Viger P, Ventura S (2019) Frequent itemset mining: a 25 years review. Wiley Interdiscip Rev Data Min Knowl Discov 9(6)
Nguyen LTT, Vu VV, Lam MTH, Duong TTM, Manh LT, Nguyen TTT, Vo B, Fujita H (2019) An efficient method for mining high utility closed itemsets. Inform Sciences 495:78–99
Article Google Scholar
Pazhaniraja N, Sountharrajan S, Kumar BS (2020) High utility itemset mining: a Boolean operators-based modified grey wolf optimization algorithm. Soft Comput 24(21):16691–16704
Article Google Scholar
Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126
Article Google Scholar
Song W, Huang C (2018) Discovering high utility itemsets based on the artificial bee colony algorithm. In: proceedings of the 22nd Pacific-Asia conference on knowledge discovery and data mining (PAKDD’18), pp 3–14
Song W, Huang C (2018) Mining high utility itemsets using bio-inspired algorithms: a diverse optimal value framework. IEEE Access 6:19568–19582
Article Google Scholar
Song W, Huang C (2020) Mining high average-utility itemsets based on particle swarm optimization. Data Sci Pattern Recognit 4(2):19–32
Google Scholar
Song W, Li J (2020) Discovering high utility itemsets using set-based particle swarm optimization. In: proceedings of the 16th international conference on advanced data mining and applications (ADMA’20), pp 38–53
Song W, Liu L, Huang C (2020) TKU-CE: cross-entropy method for mining top-k high utility itemsets. In: proceedings of the 33rd international conference on industrial, engineering and other applications of applied intelligent systems (IEA/AIE’20), pp 846–857
Song W, Zhang ZH, Li JH (2016) A high utility itemset mining algorithm based on subsume index. Knowl Inf Syst 49(1):315–340
Article Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P, Yun U, Fujita H (2021) Efficient algorithms for mining frequent high utility sequences with constraints. Inform Sciences 568:239–264
Article MathSciNet Google Scholar
Tseng VS, Wu C-W, Fournier-Viger P, Yu PS (2016) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Article Google Scholar
Vo B, Bui H, Vo T, Le T (2020) Mining top-rank-k frequent weighted itemsets using WN-list structures and an early pruning strategy. Knowl based Syst 201–202
Vo B, Nguyen LTT, Nguyen TDD, Fournier-Viger P, Yun U (2020) A multi-core approach to efficiently mining high-utility itemsets in dynamic profit databases. IEEE Access 8:85890–85899
Article Google Scholar
Wu C-W, Shie B-E, Tseng V S, Yu P S (2012) Mining top-k high utility itemsets. In: proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’12), pp 78–86
Wu JM-T, Srivastava G, Wei M, Yun U, Lin JC-W (2021) Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework. Inform Sciences 553:31–48
Article MathSciNet Google Scholar
Wu JM-T, Zhan J, Lin JC-W (2017) An ACO-based approach to mine high-utility itemsets. Knowl-Based Syst 116:102–113
Article Google Scholar
Zhang Q, Fang W, Sun J, Wang Q (2019) Improved genetic algorithm for high-utility itemset mining. IEEE Access 7:176799–176813
Article Google Scholar

Download references

Acknowledgments

This paper is a substantially extended version of our conference paper presented at IEA/AIE 2020. The authors would like to thank the anonymous reviewers for their valuable comments and suggestions, which helped to improve the quality of this paper. We would also like to thank Dr. Quang-Huy Duong for providing the source code of the kHMC algorithm. This work was supported by the National Natural Science Foundation of China (61977001), Great Wall Scholar Program (CIT&TCD20190305), and Beijing Urban Governance Research Center.

Author information

Authors and Affiliations

School of Information Science and Technology, North China University of Technology, Beijing, China
Wei Song, Chuanlong Zheng & Chaomin Huang

Authors

Wei Song
View author publications
You can also search for this author in PubMed Google Scholar
Chuanlong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Chaomin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lu Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Song.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Emerging topics in Applied Intelligence selected from IEA/AIE2020

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, W., Zheng, C., Huang, C. et al. Heuristically mining the top-k high-utility itemsets with cross-entropy optimization. Appl Intell 52, 17026–17041 (2022). https://doi.org/10.1007/s10489-021-02576-z

Download citation

Accepted: 28 May 2021
Published: 29 July 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10489-021-02576-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heuristically mining the top-k high-utility itemsets with cross-entropy optimization

Abstract

Access this article

Similar content being viewed by others

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Data mining and machine learning in retail business: developing efficiencies for better customer retention

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Heuristically mining the top-k high-utility itemsets with cross-entropy optimization

Abstract

Access this article

Similar content being viewed by others

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Data mining and machine learning in retail business: developing efficiencies for better customer retention

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation