Skip to main content
Log in

High-utility and diverse itemset mining

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

High-utility Itemset Mining (HUIM) finds patterns from a transaction database with their utility no less than a user-defined threshold. The utility of an itemset is defined as the sum of the utilities of its items. The utility notion enables a data analyst to associate a profit score with each item and thereof to a pattern. We extend the notion of high-utility with diversity to define a new pattern type called High-utility and Diverse pattern (HUD). The notion of diversity of a pattern captures the extent of the different categories covered by the selected items in the pattern. An application of diverse-pattern lies in the recommendation task where a system can recommend to a customer a set of items from a new class based on her previously bought items. Our notion of diversity is easy to compute and also captures the basic essence of a previously proposed diversity notion. The existing algorithm to compute frequent-diverse patterns is 2-phase, i.e., in the first phase, frequent patterns are computed, out of which diverse patterns are filtered out in the second phase. We, in this paper, give an integrated algorithm that efficiently computes high-utility and diverse patterns in a single phase. Our experimental study shows that our proposed algorithm is very efficient as compared to a 2-phase algorithm that extracts high-utility itemsets in the first phase and filters out the diverse itemsets in the second phase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: 20th international conference on very large databases, pp 487–499

  2. Ahmed CF, Tanbeer SK, Jeong B, Lee Y (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721. https://doi.org/10.1109/TKDE.2009.46

    Article  Google Scholar 

  3. Aryabarzan N, Minaei-Bidgoli B, Teshnehlab M (2018) NegFIN: an efficient algorithm for fast mining frequent itemsets. Expert Syst Appl 105:129–143. https://doi.org/10.1016/j.eswa.2018.03.041

    Article  Google Scholar 

  4. Dawar S, Goyal V (2015) UP-Hist tree: An efficient data structure for mining high utility patterns from transaction databases. In: Proceedings of the 19th international database engineering & applications symposium. ACM, New York, pp 56–61, DOI https://doi.org/10.1145/2790755.2790771, (to appear in print)

  5. Dawar S, Goyal V, Bera D (2017) A hybrid framework for mining high-utility itemsets in a sparse transaction database. Appl Intell 47(3):809–827. https://doi.org/10.1007/s10489-017-0932-1

    Article  Google Scholar 

  6. Deng ZH, Lv SL (2014) Fast mining frequent itemsets using nodesets. Expert Syst Appl 41 (10):4505–4512. https://doi.org/10.1016/j.eswa.2014.01.025

    Article  Google Scholar 

  7. Deng ZH, Lv SL (2015) Prepost+: An efficient n-lists-based algorithm for mining frequent itemsets via children–parent equivalence pruning. Expert Syst Appl 42(13):5424–5432. https://doi.org/10.1016/j.eswa.2015.03.004

    Article  Google Scholar 

  8. Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS (2014) SPMF: a java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393

    MATH  Google Scholar 

  9. Fournier-Viger P, Lin JCW, Truong-Chi T, Nkambou R (2019) A survey of high utility itemset mining. In: High-utility pattern mining. Springer, New York, pp 1–45, DOI https://doi.org/10.1007/978-3-030-04921-8_1, (to appear in print)

  10. Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Foundations of intelligent systems. Springer International Publishing, New York, pp 83–92, DOI https://doi.org/10.1007/978-3-319-08326-1_9, (to appear in print)

  11. Fournier-Viger P, Zhang Y, Lin JCW, Fujita H, Koh YS (2019) Mining local and peak high utility itemsets. Inform Sci 481:344–367. https://doi.org/10.1016/j.ins.2018.12.070

    Article  MathSciNet  Google Scholar 

  12. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00. ACM, New York, pp 1–12, DOI https://doi.org/10.1145/342009.335372, (to appear in print)

  13. Liu J, Wang K, Fung BCM (2016) Mining high utility patterns in one phase without generating candidates. IEEE Trans Knowl Data Eng 28(5):1245–1257. https://doi.org/10.1109/TKDE.2015.2510012

    Article  Google Scholar 

  14. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, New York, pp 55–64, DOI https://doi.org/10.1145/2396761.2396773 , (to appear in print)

  15. Liu Y, Liao Wk, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st international workshop on utility-based data mining. ACM, New York, pp 90–99, DOI https://doi.org/10.1145/1089827.1089839, (to appear in print)

  16. Liu Y, Liao Wk, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in knowledge discovery and data mining. Springer, New York, pp 689–695, DOI https://doi.org/10.1007/11430919_79, (to appear in print)

  17. Liu YC, Cheng CP, Tseng VS (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinforma 14(1):230. https://doi.org/10.1186/1471-2105-14-230

    Article  Google Scholar 

  18. Nguyen LT, Nguyen P, Nguyen TD, Vo B, Fournier-Viger P, Tseng VS (2019) Mining high-utility itemsets in dynamic profit databases. Knowl-Based Syst 175:130–144. https://doi.org/10.1016/j.knosys.2019.03.022

    Article  Google Scholar 

  19. Nguyen LTT, Vu VV, Lam MTH, Duong TTM, Manh LT, Nguyen TTT, Vo B, Fujita H (2019) An efficient method for mining high utility closed itemsets. Inf Sci 495:78–99. https://doi.org/10.1016/j.ins.2019.05.006

    Article  Google Scholar 

  20. Srivastava S, Kiran RU, Reddy PK (2011) Discovering diverse-frequent patterns in transactional databases. In: Proceedings of the 17th international conference on management of data. Computer Society of India, Mumbai, pp 14:1–14:10

  21. Swamy MK, Reddy PK (2015) Improving diversity performance of association rule based recommender systems. In: Database and expert systems applications. Springer, New York, pp 499–508, DOI https://doi.org/10.1007/978-3-319-22849-5_34, (to appear in print)

  22. Swamy MK, Reddy PK, Bhalla S (2017) Association rule based approach to improve diversity of query recommendations. In: International conference on database and expert systems applications. Springer, New York, pp 340–350, DOI https://doi.org/10.1007/978-3-319-64471-4_27, (to appear in print)

  23. Swamy MK, Reddy PK, Srivastava S (2014) Extracting diverse patterns with unbalanced concept hierarchy. In: Pacific-asia conference on knowledge discovery and data mining. Springer, New York, pp 15–27, DOI https://doi.org/10.1007/978-3-319-06608-0_2, (to appear in print)

  24. Tseng VS, Shie B, Wu C, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786. https://doi.org/10.1109/TKDE.2012.59

    Article  Google Scholar 

  25. Vo B, Nguyen LV, Vu VV, Lam MTH, Duong TTM, Manh LT, Nguyen TTT, Nguyen LTT, Hong T (2020) Mining correlated high utility itemsets in one phase. IEEE Access 8:90465–90477. https://doi.org/10.1109/ACCESS.2020.2994059

    Article  Google Scholar 

  26. Wu D, Luo D, Jensen CS, Huang JZ (2019) Efficiently mining maximal diverse frequent itemsets. In: International conference on database systems for advanced applications. Springer, New York, pp 191–207, DOI https://doi.org/10.1007/978-3-030-18579-4_12, (to appear in print)

  27. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12 (3):372–390. https://doi.org/10.1109/69.846291

    Article  Google Scholar 

  28. Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2017) EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625. https://doi.org/10.1007/s10115-016-0986-0

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by Infosys Centre for Artificial Intelligence, Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi), and Visvesvaraya Ph.D. scheme for Electronics and IT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vikram Goyal.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Verma, A., Dawar, S., Kumar, R. et al. High-utility and diverse itemset mining. Appl Intell 51, 4649–4663 (2021). https://doi.org/10.1007/s10489-020-02063-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02063-x

Keywords

Navigation