High-utility and diverse itemset mining

Verma, Amit; Dawar, Siddharth; Kumar, Raman; Navathe, Shamkant; Goyal, Vikram

doi:10.1007/s10489-020-02063-x

High-utility and diverse itemset mining

Published: 05 January 2021

Volume 51, pages 4649–4663, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Amit Verma¹,
Siddharth Dawar²,
Raman Kumar¹,
Shamkant Navathe³ &
…
Vikram Goyal ORCID: orcid.org/0000-0003-0769-6381²

311 Accesses
4 Citations
5 Altmetric
Explore all metrics

Abstract

High-utility Itemset Mining (HUIM) finds patterns from a transaction database with their utility no less than a user-defined threshold. The utility of an itemset is defined as the sum of the utilities of its items. The utility notion enables a data analyst to associate a profit score with each item and thereof to a pattern. We extend the notion of high-utility with diversity to define a new pattern type called High-utility and Diverse pattern (HUD). The notion of diversity of a pattern captures the extent of the different categories covered by the selected items in the pattern. An application of diverse-pattern lies in the recommendation task where a system can recommend to a customer a set of items from a new class based on her previously bought items. Our notion of diversity is easy to compute and also captures the basic essence of a previously proposed diversity notion. The existing algorithm to compute frequent-diverse patterns is 2-phase, i.e., in the first phase, frequent patterns are computed, out of which diverse patterns are filtered out in the second phase. We, in this paper, give an integrated algorithm that efficiently computes high-utility and diverse patterns in a single phase. Our experimental study shows that our proposed algorithm is very efficient as compared to a 2-phase algorithm that extracts high-utility itemsets in the first phase and filters out the diverse itemsets in the second phase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Recommender Systems: Techniques, Applications, and Challenges

A comprehensive survey on feature selection in the various fields of machine learning

Article 23 July 2021

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: 20^th international conference on very large databases, pp 487–499
Ahmed CF, Tanbeer SK, Jeong B, Lee Y (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721. https://doi.org/10.1109/TKDE.2009.46
Article Google Scholar
Aryabarzan N, Minaei-Bidgoli B, Teshnehlab M (2018) NegFIN: an efficient algorithm for fast mining frequent itemsets. Expert Syst Appl 105:129–143. https://doi.org/10.1016/j.eswa.2018.03.041
Article Google Scholar
Dawar S, Goyal V (2015) UP-Hist tree: An efficient data structure for mining high utility patterns from transaction databases. In: Proceedings of the 19^th international database engineering & applications symposium. ACM, New York, pp 56–61, DOI https://doi.org/10.1145/2790755.2790771, (to appear in print)
Dawar S, Goyal V, Bera D (2017) A hybrid framework for mining high-utility itemsets in a sparse transaction database. Appl Intell 47(3):809–827. https://doi.org/10.1007/s10489-017-0932-1
Article Google Scholar
Deng ZH, Lv SL (2014) Fast mining frequent itemsets using nodesets. Expert Syst Appl 41 (10):4505–4512. https://doi.org/10.1016/j.eswa.2014.01.025
Article Google Scholar
Deng ZH, Lv SL (2015) Prepost+: An efficient n-lists-based algorithm for mining frequent itemsets via children–parent equivalence pruning. Expert Syst Appl 42(13):5424–5432. https://doi.org/10.1016/j.eswa.2015.03.004
Article Google Scholar
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS (2014) SPMF: a java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393
MATH Google Scholar
Fournier-Viger P, Lin JCW, Truong-Chi T, Nkambou R (2019) A survey of high utility itemset mining. In: High-utility pattern mining. Springer, New York, pp 1–45, DOI https://doi.org/10.1007/978-3-030-04921-8_1, (to appear in print)
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Foundations of intelligent systems. Springer International Publishing, New York, pp 83–92, DOI https://doi.org/10.1007/978-3-319-08326-1_9, (to appear in print)
Fournier-Viger P, Zhang Y, Lin JCW, Fujita H, Koh YS (2019) Mining local and peak high utility itemsets. Inform Sci 481:344–367. https://doi.org/10.1016/j.ins.2018.12.070
Article MathSciNet Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00. ACM, New York, pp 1–12, DOI https://doi.org/10.1145/342009.335372, (to appear in print)
Liu J, Wang K, Fung BCM (2016) Mining high utility patterns in one phase without generating candidates. IEEE Trans Knowl Data Eng 28(5):1245–1257. https://doi.org/10.1109/TKDE.2015.2510012
Article Google Scholar
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21^st ACM international conference on information and knowledge management. ACM, New York, pp 55–64, DOI https://doi.org/10.1145/2396761.2396773 , (to appear in print)
Liu Y, Liao Wk, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st international workshop on utility-based data mining. ACM, New York, pp 90–99, DOI https://doi.org/10.1145/1089827.1089839, (to appear in print)
Liu Y, Liao Wk, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in knowledge discovery and data mining. Springer, New York, pp 689–695, DOI https://doi.org/10.1007/11430919_79, (to appear in print)
Liu YC, Cheng CP, Tseng VS (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinforma 14(1):230. https://doi.org/10.1186/1471-2105-14-230
Article Google Scholar
Nguyen LT, Nguyen P, Nguyen TD, Vo B, Fournier-Viger P, Tseng VS (2019) Mining high-utility itemsets in dynamic profit databases. Knowl-Based Syst 175:130–144. https://doi.org/10.1016/j.knosys.2019.03.022
Article Google Scholar
Nguyen LTT, Vu VV, Lam MTH, Duong TTM, Manh LT, Nguyen TTT, Vo B, Fujita H (2019) An efficient method for mining high utility closed itemsets. Inf Sci 495:78–99. https://doi.org/10.1016/j.ins.2019.05.006
Article Google Scholar
Srivastava S, Kiran RU, Reddy PK (2011) Discovering diverse-frequent patterns in transactional databases. In: Proceedings of the 17th international conference on management of data. Computer Society of India, Mumbai, pp 14:1–14:10
Swamy MK, Reddy PK (2015) Improving diversity performance of association rule based recommender systems. In: Database and expert systems applications. Springer, New York, pp 499–508, DOI https://doi.org/10.1007/978-3-319-22849-5_34, (to appear in print)
Swamy MK, Reddy PK, Bhalla S (2017) Association rule based approach to improve diversity of query recommendations. In: International conference on database and expert systems applications. Springer, New York, pp 340–350, DOI https://doi.org/10.1007/978-3-319-64471-4_27, (to appear in print)
Swamy MK, Reddy PK, Srivastava S (2014) Extracting diverse patterns with unbalanced concept hierarchy. In: Pacific-asia conference on knowledge discovery and data mining. Springer, New York, pp 15–27, DOI https://doi.org/10.1007/978-3-319-06608-0_2, (to appear in print)
Tseng VS, Shie B, Wu C, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786. https://doi.org/10.1109/TKDE.2012.59
Article Google Scholar
Vo B, Nguyen LV, Vu VV, Lam MTH, Duong TTM, Manh LT, Nguyen TTT, Nguyen LTT, Hong T (2020) Mining correlated high utility itemsets in one phase. IEEE Access 8:90465–90477. https://doi.org/10.1109/ACCESS.2020.2994059
Article Google Scholar
Wu D, Luo D, Jensen CS, Huang JZ (2019) Efficiently mining maximal diverse frequent itemsets. In: International conference on database systems for advanced applications. Springer, New York, pp 191–207, DOI https://doi.org/10.1007/978-3-030-18579-4_12, (to appear in print)
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12 (3):372–390. https://doi.org/10.1109/69.846291
Article Google Scholar
Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2017) EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625. https://doi.org/10.1007/s10115-016-0986-0
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by Infosys Centre for Artificial Intelligence, Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi), and Visvesvaraya Ph.D. scheme for Electronics and IT.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, IKGPTU, Kapurthala, Punjab, India
Amit Verma & Raman Kumar
Department of Computer Science, IIIT-Delhi, New Delhi, India
Siddharth Dawar & Vikram Goyal
College Of Computing, Georgia Institute of Technology, Atlanta, Georgia, USA
Shamkant Navathe

Authors

Amit Verma
View author publications
You can also search for this author in PubMed Google Scholar
Siddharth Dawar
View author publications
You can also search for this author in PubMed Google Scholar
Raman Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Shamkant Navathe
View author publications
You can also search for this author in PubMed Google Scholar
Vikram Goyal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vikram Goyal.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verma, A., Dawar, S., Kumar, R. et al. High-utility and diverse itemset mining. Appl Intell 51, 4649–4663 (2021). https://doi.org/10.1007/s10489-020-02063-x

Download citation

Accepted: 04 November 2020
Published: 05 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s10489-020-02063-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-utility and diverse itemset mining

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Recommender Systems: Techniques, Applications, and Challenges

A comprehensive survey on feature selection in the various fields of machine learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-utility and diverse itemset mining

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Recommender Systems: Techniques, Applications, and Challenges

A comprehensive survey on feature selection in the various fields of machine learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation