Top-k high average-utility itemsets mining with effective pruning strategies

Wu, Ronghui; He, Zhan

doi:10.1007/s10489-018-1155-9

Top-k high average-utility itemsets mining with effective pruning strategies

Published: 03 March 2018

Volume 48, pages 3429–3445, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ronghui Wu¹ &
Zhan He¹

408 Accesses
14 Citations
Explore all metrics

Abstract

High average-utility itemset (HAUI) mining has recently received interest in the data mining field due to its balanced utility measurement, which considers not only profits and quantities of items but also the lengths of itemsets. Although several algorithms have been designed for the task of HAUI mining in recent years, it is hard for users to determine an appropriate minimum average-utility threshold for the algorithms to work efficiently and control the mining result precisely. In this paper, we address this issue by introducing a framework of top-k HAUI mining, where \(k\) is the desired number of high average-utility itemsets to be mined instead of setting a minimum average-utility threshold. An efficient list based algorithm named TKAU is proposed to mine the top-k high average-utility itemsets in a single phase. TKAU introduces two novel strategies, named EMUP and EA to avoid performing costly join operations for calculating the utilities of itemsets. Moreover, three strategies named RIU, CAD, and EPBF are also incorporated to raise its internal minimal average-utility threshold effectively, and thus reduce the search space. Extensive experiments on both real and synthetic datasets show that the proposed algorithm has excellent performance and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Article Open access 15 January 2021

K-DBSCAN: An improved DBSCAN algorithm for big data

Article 26 November 2020

References

Spmf A java open-source pattern mining library. http://www.philippe-fournier-viger.com/spmf/
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB’94, Proceedings of 20th international conference on very large data bases. Santiago de Chile, Chile [2], pp 487–499
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Article Google Scholar
Cheung YL, Fu AWC (2004) Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng 16(9):1052–1069
Article Google Scholar
Duong QH, Liao B, Fournier-Viger P, Dam TL (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122
Article Google Scholar
Duong TLDLFVH (2016) An efficient algorithm for mining top-rank-k frequent patterns. Appl Intell, 96–111
Fournier-viger P, Wu CW, Zida S, Vincent S (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning, 83–92
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2):1–12
Article Google Scholar
Hong T, Lee C, Wang S (2009) Mining high average-utility itemsets. In: Proceedings of the IEEE international conference on systems, man and cybernetics. San Antonio, pp 2526–2530
Hong TP, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Article Google Scholar
Jabbar MA, Deekshatulu BL, Chandra P (2015) A novel algorithm for utility-frequent itemset mining in market basket analysis. In: Innovations in bio-inspired computing and applications - proceedings of the 6th international conference on innovations in bio-inspired computing and applications (IBICA 2015) held in Kochi, India during December 16-18, 2015, pp 337–345
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Article Google Scholar
Lan G, Hong T, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Making 11(5):1009–1030
Article Google Scholar
Lan G, Hong T, Tseng VS (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28(1):193–209
Google Scholar
Le T, Vo B (2015) An N-list-based algorithm for mining frequent closed patterns. Expert Syst Appl 42(19):6648–6657
Article Google Scholar
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Article Google Scholar
Lin CW, Hong TP, Lu WH (2010) Efficiently mining high average utility itemsets with a tree structure. Springer, Berlin
Book Google Scholar
Lin JCW, Li T, Fournier-Viger P, Hong TP, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243
Article Google Scholar
Lin KC, Liao IE, Chang TP, Lin SF (2014) A frequent itemset mining algorithm based on the Principle of Inclusion–Exclusion and transaction mapping. Inform Sci 276:278–289
Article Google Scholar
Liu J, Wang K, Fung BCM (2012) Direct discovery of high utility itemsets without candidate generation. In: Proceedings - IEEE International conference on data mining. ICDM, pp 984–989
Liu J, Wang K, Fung BCM (2016) Mining high utility patterns in one phase without generating candidates. IEEE Trans Knowl Data Eng 28(5):1245–1257
Article Google Scholar
Liu Y, Cheng C, Tseng VS (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinforma 14:230
Article Google Scholar
Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. Adv Knowl Discov Data Mining, 689–695
Lu T, Vo B, Nguyen HT, Hong TP (2014) A new method for mining high average utility itemsets. In: 13th IFIP TC8 international conference on computer information systems and industrial management, CISIM, vol 8838, pp 33–42
Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126
Article Google Scholar
Salam A, Khayal MSH (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86
Article Google Scholar
Shao J, Meng X, Cao L (2016) Mining actionable combined high utility incremental and associated patterns. In: Ieee/csaa International conference on aircraft utility systems, pp 1164–1169
Thilagu M, Nadarajan R (2012) Efficiently mining of effective web traversal patterns with average utility. Procedia Technol 6(4):444–451
Article Google Scholar
Tseng V, Wu C, Shie B, Yu P (2010) UP-Growth: an efficient algorithm for high utility itemset mining. Discov Data Mining, 253–262
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Article Google Scholar
Tseng VS, Wu CW, Fournier-Viger P, Yu PS (2016) Efficient algorithms for mining Top-K high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Article Google Scholar
Vo B, Le T, Coenen F, Hong TP (2016) Mining frequent itemsets using the N-list and subsume concepts. Int J Mach Learn Cybern 7(2):253–265
Article Google Scholar
Wang J, Han J, Lu Y, Tzvetkov P (2005) Tfp: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5):652–663
Article Google Scholar
Weng CH (2015) Discovering highly expected utility itemsets for revenue prediction. Knowl-Based Syst 104:39–51
Article Google Scholar
Wu CW, Shie BE, Tseng VS, Yu PS (2012) Mining top-K high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining - KDD ’12, p 78
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3 SPEC. ISS.):603–626
Article Google Scholar
Yun U, Kim D (2016) Mining of high average-utility itemsets using novel list structure and pruning strategy. Futur Gener Comput Syst 68:346–360
Article Google Scholar
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878
Article Google Scholar
Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2016) EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst

Download references

Acknowledgements

This work was partially funded by the National Natural Science Foundation of China (Grant Nos. 61370171, 61672214, 60973082, 11171369).

Author information

Authors and Affiliations

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
Ronghui Wu & Zhan He

Authors

Ronghui Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhan He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhan He.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, R., He, Z. Top-k high average-utility itemsets mining with effective pruning strategies. Appl Intell 48, 3429–3445 (2018). https://doi.org/10.1007/s10489-018-1155-9

Download citation

Published: 03 March 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s10489-018-1155-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top-k high average-utility itemsets mining with effective pruning strategies

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

K-DBSCAN: An improved DBSCAN algorithm for big data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Top-k high average-utility itemsets mining with effective pruning strategies

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

K-DBSCAN: An improved DBSCAN algorithm for big data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation