Abstract
High-utility Itemset Mining (HUIM) finds patterns from a transaction database with their utility no less than a user-defined threshold. The utility of an itemset is defined as the sum of the utilities of its items. The utility notion enables a data analyst to associate a profit score with each item and thereof to a pattern. We extend the notion of high-utility with diversity to define a new pattern type called High-utility and Diverse pattern (HUD). The notion of diversity of a pattern captures the extent of the different categories covered by the selected items in the pattern. An application of diverse-pattern lies in the recommendation task where a system can recommend to a customer a set of items from a new class based on her previously bought items. Our notion of diversity is easy to compute and also captures the basic essence of a previously proposed diversity notion. The existing algorithm to compute frequent-diverse patterns is 2-phase, i.e., in the first phase, frequent patterns are computed, out of which diverse patterns are filtered out in the second phase. We, in this paper, give an integrated algorithm that efficiently computes high-utility and diverse patterns in a single phase. Our experimental study shows that our proposed algorithm is very efficient as compared to a 2-phase algorithm that extracts high-utility itemsets in the first phase and filters out the diverse itemsets in the second phase.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: 20th international conference on very large databases, pp 487–499
Ahmed CF, Tanbeer SK, Jeong B, Lee Y (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721. https://doi.org/10.1109/TKDE.2009.46
Aryabarzan N, Minaei-Bidgoli B, Teshnehlab M (2018) NegFIN: an efficient algorithm for fast mining frequent itemsets. Expert Syst Appl 105:129–143. https://doi.org/10.1016/j.eswa.2018.03.041
Dawar S, Goyal V (2015) UP-Hist tree: An efficient data structure for mining high utility patterns from transaction databases. In: Proceedings of the 19th international database engineering & applications symposium. ACM, New York, pp 56–61, DOI https://doi.org/10.1145/2790755.2790771, (to appear in print)
Dawar S, Goyal V, Bera D (2017) A hybrid framework for mining high-utility itemsets in a sparse transaction database. Appl Intell 47(3):809–827. https://doi.org/10.1007/s10489-017-0932-1
Deng ZH, Lv SL (2014) Fast mining frequent itemsets using nodesets. Expert Syst Appl 41 (10):4505–4512. https://doi.org/10.1016/j.eswa.2014.01.025
Deng ZH, Lv SL (2015) Prepost+: An efficient n-lists-based algorithm for mining frequent itemsets via children–parent equivalence pruning. Expert Syst Appl 42(13):5424–5432. https://doi.org/10.1016/j.eswa.2015.03.004
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS (2014) SPMF: a java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393
Fournier-Viger P, Lin JCW, Truong-Chi T, Nkambou R (2019) A survey of high utility itemset mining. In: High-utility pattern mining. Springer, New York, pp 1–45, DOI https://doi.org/10.1007/978-3-030-04921-8_1, (to appear in print)
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Foundations of intelligent systems. Springer International Publishing, New York, pp 83–92, DOI https://doi.org/10.1007/978-3-319-08326-1_9, (to appear in print)
Fournier-Viger P, Zhang Y, Lin JCW, Fujita H, Koh YS (2019) Mining local and peak high utility itemsets. Inform Sci 481:344–367. https://doi.org/10.1016/j.ins.2018.12.070
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00. ACM, New York, pp 1–12, DOI https://doi.org/10.1145/342009.335372, (to appear in print)
Liu J, Wang K, Fung BCM (2016) Mining high utility patterns in one phase without generating candidates. IEEE Trans Knowl Data Eng 28(5):1245–1257. https://doi.org/10.1109/TKDE.2015.2510012
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, New York, pp 55–64, DOI https://doi.org/10.1145/2396761.2396773 , (to appear in print)
Liu Y, Liao Wk, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st international workshop on utility-based data mining. ACM, New York, pp 90–99, DOI https://doi.org/10.1145/1089827.1089839, (to appear in print)
Liu Y, Liao Wk, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in knowledge discovery and data mining. Springer, New York, pp 689–695, DOI https://doi.org/10.1007/11430919_79, (to appear in print)
Liu YC, Cheng CP, Tseng VS (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinforma 14(1):230. https://doi.org/10.1186/1471-2105-14-230
Nguyen LT, Nguyen P, Nguyen TD, Vo B, Fournier-Viger P, Tseng VS (2019) Mining high-utility itemsets in dynamic profit databases. Knowl-Based Syst 175:130–144. https://doi.org/10.1016/j.knosys.2019.03.022
Nguyen LTT, Vu VV, Lam MTH, Duong TTM, Manh LT, Nguyen TTT, Vo B, Fujita H (2019) An efficient method for mining high utility closed itemsets. Inf Sci 495:78–99. https://doi.org/10.1016/j.ins.2019.05.006
Srivastava S, Kiran RU, Reddy PK (2011) Discovering diverse-frequent patterns in transactional databases. In: Proceedings of the 17th international conference on management of data. Computer Society of India, Mumbai, pp 14:1–14:10
Swamy MK, Reddy PK (2015) Improving diversity performance of association rule based recommender systems. In: Database and expert systems applications. Springer, New York, pp 499–508, DOI https://doi.org/10.1007/978-3-319-22849-5_34, (to appear in print)
Swamy MK, Reddy PK, Bhalla S (2017) Association rule based approach to improve diversity of query recommendations. In: International conference on database and expert systems applications. Springer, New York, pp 340–350, DOI https://doi.org/10.1007/978-3-319-64471-4_27, (to appear in print)
Swamy MK, Reddy PK, Srivastava S (2014) Extracting diverse patterns with unbalanced concept hierarchy. In: Pacific-asia conference on knowledge discovery and data mining. Springer, New York, pp 15–27, DOI https://doi.org/10.1007/978-3-319-06608-0_2, (to appear in print)
Tseng VS, Shie B, Wu C, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786. https://doi.org/10.1109/TKDE.2012.59
Vo B, Nguyen LV, Vu VV, Lam MTH, Duong TTM, Manh LT, Nguyen TTT, Nguyen LTT, Hong T (2020) Mining correlated high utility itemsets in one phase. IEEE Access 8:90465–90477. https://doi.org/10.1109/ACCESS.2020.2994059
Wu D, Luo D, Jensen CS, Huang JZ (2019) Efficiently mining maximal diverse frequent itemsets. In: International conference on database systems for advanced applications. Springer, New York, pp 191–207, DOI https://doi.org/10.1007/978-3-030-18579-4_12, (to appear in print)
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12 (3):372–390. https://doi.org/10.1109/69.846291
Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2017) EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625. https://doi.org/10.1007/s10115-016-0986-0
Acknowledgements
This work was supported in part by Infosys Centre for Artificial Intelligence, Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi), and Visvesvaraya Ph.D. scheme for Electronics and IT.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Verma, A., Dawar, S., Kumar, R. et al. High-utility and diverse itemset mining. Appl Intell 51, 4649–4663 (2021). https://doi.org/10.1007/s10489-020-02063-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02063-x