Generalized maximal utility for mining high average-utility itemsets

Song, Wei; Liu, Lu; Huang, Chaomin

doi:10.1007/s10115-021-01614-z

Generalized maximal utility for mining high average-utility itemsets

Regular Paper
Published: 21 October 2021

Volume 63, pages 2947–2967, (2021)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

267 Accesses
19 Citations
1 Altmetric
Explore all metrics

Abstract

Mining high average-utility itemsets (HAUIs) is a promising research topic in data mining because, in contrast to high utility itemsets, they are not biased toward long itemsets. Regardless of what upper bounds and pruning strategies are used, most existing HAUI mining algorithms are founded on the concept of maximal utility, namely the highest utility of a single item in each transaction. In this paper, we study this problem by generalizing the typical maximal utility and average-utility upper bound from a single item to an itemset, and propose an efficient HAIU mining algorithm based on generalized maximal utility (HAUIM-GMU). For this algorithm, we first propose the concepts of generalized maximal utility and the generalized average-utility upper bound, and discuss how the proposed upper bound can be made tighter to generate fewer candidates. A new pruning strategy is then proposed based on the concept of support, and this is shown to be effective for filtering out unpromising itemsets. The final algorithm is described in detail. Extensive experimental results show that the HAUIM-GMU algorithm outperforms existing state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey of data mining

Article 06 February 2020

Big data analytics: a survey

Article Open access 01 October 2015

On the nature and types of anomalies: a review of deviations in data

Article Open access 04 August 2021

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings 20th international conference on very large data bases. Morgan Kaufmann, Santiago de Chile, pp 487–499
Deng Z-H (2018) An efficient structure for fast mining high utility itemsets. Appl Intell 48(9):3161–3177
Article Google Scholar
Fournier-Viger P, Lin CW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: Proceedings of the 19th European conference on machine learning and knowledge discovery in databases, Riva del Garda, Italy (September 2016) Lecture notes in computer science, vol 9853. Springer, Cham, pp 36–40
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Article MathSciNet Google Scholar
Hong T-P, Lee C-H, Wang S-L (2009) Mining high average-utility itemsets. In: Proceedings of the 2009 IEEE international conference on systems, man, and cybernetics. IEEE, San Antonio, pp 2526–2530
Jaysawal BP, Huang J-W (2019) DMHUPS: discovering multiple high utility patterns simultaneously. Knowl Inf Syst 59(2):337–359
Article Google Scholar
Kim D, Yun U (2017) Efficient algorithm for mining high average-utility itemsets in incremental transaction databases. Appl Intell 47(1):114–131
Article Google Scholar
Lan G-C, Hong T-P, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Tech Decis 11(5):1009–1030
Article Google Scholar
Lan G-C, Hong T-P, Tseng VS (2012) A projection-based approach for discovering high average-utility itemsets. J Inform Sci Eng 28:193–209
Google Scholar
Lin C-W, Hong T-P, Lu W-H (2010) Efficiently mining high average utility itemsets with a tree structure. In: Proceedings of the second international conference on intelligent information and database systems, Hue City, Vietnam (March 2010). Lecture notes in computer science, vol 5990. Springer, Berlin, pp 131–139
Lin J C-W, Li T, Fournier-Viger P, Hong T-P, Su J-H (2016) Efficient mining of high average-utility itemsets with multiple minimum thresholds. In: Proceedings of the industrial conference on data mining, New York, NY, USA (July 2016). Lecture notes in computer science, vol 9728. Springer, Cham, pp 14–28
Lin JC-W, Li T, Fournier-Viger P, Hong T-P, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243
Article Google Scholar
Lin JC-W, Ren S, Fournier-Viger P (2018) MEMU: more efficient algorithm to mine high average-utility patterns with multiple minimum average-utility thresholds. IEEE Access 6:7593–7609
Article Google Scholar
Lin JC-W, Ren S, Fournier-Viger P, Hong T-P (2017) EHAUPM: efficient high average-utility pattern mining with tighter upper bounds. IEEE Access 5:12927–12940
Article Google Scholar
Lin JC-W, Shao Y, Fournier-Viger P, Djenouri Y, Guo X (2018) Maintenance algorithm for high average-utility itemsets with transaction deletion. Appl Intell 48(10):3691–3706
Article Google Scholar
Liu Y, Liao W-K, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining, Hanoi, Vietnam (May 2005). Lecture notes in computer science, vol 3518. Springer, Berlin, pp 689–695
Lu T, Vo B, Nguyen H, Hong T-P (2015) A new method for mining high average utility itemsets. In: Proceedings of the 13th IFIP international conference on computer information systems and industrial management. Springer, Ho Chi Minh City, pp 33–42
Sethi KK, Ramesh D, Sreenu M (2019) Parallel high average-utility itemset mining using better search space division approach. In: Proceedings of the international conference on distributed computing and internet technology, Bhubaneswar, India (January 2019). Lecture notes in computer science, vol 11319. Springer, Cham, pp 108–124
Song W, Liu Y, Li JH (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40(1):29–43
Article Google Scholar
Song W, Liu Y, Li JH (2014) BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. Int J Data Warehous 10(1):1–15
Article Google Scholar
Song W, Yang BR, Xu ZY (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl-Based Syst 21(6):507–513
Article Google Scholar
Song W, Zhang Z, Li JH (2016) A high utility itemset mining algorithm based on subsume index. Knowl Inf Syst 49(1):315–340
Article Google Scholar
Wu JM-T, Lin JC-W, Pirouz M, Fournier-Viger P (2018) TUB-HAUPM: tighter upper bound for mining high average-utility patterns. IEEE Access 6:18655–18669
Article Google Scholar
Wu R, He Z (2018) Top-k high average-utility itemsets mining with effective pruning strategies. Appl Intell 48(10):3429–3445
Article Google Scholar
Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Future Gen Comp Syst 68:346–360
Article Google Scholar
Yun U, Kim D, Ryang H, Lee G, Lee K-M (2016) Mining recent high average utility patterns based on sliding window from stream data. J Intell Fuzzy Syst 30(6):3605–3617
Article Google Scholar
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions, which helped to improve the quality of this paper. This work was partially supported by the National Natural Science Foundation of China (61977001) and the Great Wall Scholar Program (CIT & TCD20190305).

Author information

Authors and Affiliations

School of Information Science and Technology, North China University of Technology, Beijing, 100144, China
Wei Song, Lu Liu & Chaomin Huang
Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, Beijing, 100144, China
Wei Song

Authors

Wei Song
View author publications
You can also search for this author in PubMed Google Scholar
Lu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chaomin Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Song.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, W., Liu, L. & Huang, C. Generalized maximal utility for mining high average-utility itemsets. Knowl Inf Syst 63, 2947–2967 (2021). https://doi.org/10.1007/s10115-021-01614-z

Download citation

Received: 17 January 2019
Revised: 18 September 2021
Accepted: 02 October 2021
Published: 21 October 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s10115-021-01614-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized maximal utility for mining high average-utility itemsets

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of data mining

Big data analytics: a survey

On the nature and types of anomalies: a review of deviations in data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generalized maximal utility for mining high average-utility itemsets

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of data mining

Big data analytics: a survey

On the nature and types of anomalies: a review of deviations in data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation