Skip to main content
Log in

An efficient fast algorithm for discovering closed+ high utility itemsets

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In recent years, high utility itemsets (HUIs) mining from the transactional databases becomes one of the most emerging research topic in the field of data mining due to its wide range of applications in online e-commerce data analysis, identifying interesting patterns in biomedical data and for cross marketing solutions in retail business. It aims to discover the itemsets with high utilities efficiently by considering item quantities in a transaction and profit values of each item. However, it produces a tremendous number of HUIs, which imposes further burden in analysis of the extracted patterns and also degrades the performance of mining methods. Mining the set of closed + high utility itemsets (CHUIs) solves this issue as it is a loss-less and condensed representation of all HUIs. In this paper, we aim to present a new algorithm for finding CHUIs from a transactional database, called the CHUM (Closed + High Utility itemset Miner), which is scalable and efficient. The proposed mining algorithm adopts a tricky aimed vertical representation of the database in order to speed up the execution time in generating itemset closures and compute their utility information without accessing the database. The proposed method makes use of the item co-occurrences strategy in order to further reduce the number of intersections needed to be performed. Several experiments are conducted on various sparse and dense datasets and the simulation results clearly show the scalability and superior performance of our algorithm as compared to those for the existing state-of-the-art CHUD (Closed + High Utility itemset Discovery) algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB ’94), pp 487–499

  2. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, pp 3–14

  3. Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2011) HUC-prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198

    Article  Google Scholar 

  4. Barber B, Hamilton HJ (2003) Extracting share frequent itemsets with infrequent subsets. Data Min Knowl Disc 7(2):153– 185

    Article  MathSciNet  Google Scholar 

  5. Boulicaut JF, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowl Disc 7(1):5– 22

    Article  MathSciNet  Google Scholar 

  6. Burdick D, Calimlim M, Gehrke J (2001) Mafia: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of 17th international conference on data engineering, vol 2001, pp 443–452

  7. Cai CH, Fu AC, Cheng C, Kwong WW (1998) Mining association rules with weighted items. In: Proceedings of international database engineering and applications symposium (IDEAS’98), vol 1998, pp 68–77

  8. Calders T, Goethals B (2007) Non-derivable itemset mining. Data Min Knowl Disc 14(1):171–206

    Article  MathSciNet  Google Scholar 

  9. Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: 3rd IEEE international conference on data mining (ICDM’03), 2003, pp 19–26

  10. Chen CH, Chiang RD, Lee CM, Chen CY (2012) Improving the performance of association classifiers by rule prioritization. Knowl-Based Syst 36:59–67

    Article  Google Scholar 

  11. Chen Y, Zhao Y, Yao Y (2007) A profit-based business model for evaluating rule interestingness. In: Advances in Artificial Intelligence, LNCS, vol 4509, pp 296– 307

  12. Chuang KT, Huang JL, Chen MS (2008) Mining top-k frequent patterns in the presence of the memory constraint. VLDB J 17(5):1321–1344

    Article  Google Scholar 

  13. IBM Quest Synthetic Data Generator, http://www.cs.loyola.edu/cgiannel/assoc_gen.html, accessed on August 2011

  14. FIMI (2003) FIMI: The frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/, accessed on February 2012

  15. Fournier-Viger P, Gomariz A, Soltani A, Gueniche T (2014) SPMF: Open-Source Data Mining Library. http://www.philippe-fournier-viger.com/spmf/ , accessed on August 2014

  16. Fournier-Viger P, Wu C, Tseng VS (2014) Novel concise representations of high utility itemsets using generator patterns. In: Advanced Data Mining and Applications, LNCS, vol 8933 , pp 30–43

  17. Fournier-Viger P, Wu C, Zida S, Tseng VS (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Foundations of Intelligent Systems, LNCS, vol 8502, pp 83–92

  18. Hilderman RJ, Hamiliton HJ, Carter CL, Cercone N (1998) Mining association rules from market basket data using share measures and characterized itemsets. Int J Artif Intell Tools 7(2): 189–220

    Article  Google Scholar 

  19. Koh Y, Pears R, Yeap W (2010) Valency based weighted association rule mining. In: Advances in Knowledge Discovery and Data Mining, LNCS, vol 6118, pp 274–285

  20. Le T, Vo B (2015) An n-list-based algorithm for mining frequent closed patterns. Expert Syst with Appl 42(19):6648– 6657

    Article  Google Scholar 

  21. Lee D, Park SH, Moon S (2013) Utility-based association rule mining: A marketing solution for cross-selling. Expert Syst with Appl 40(7):2715–2725

    Article  Google Scholar 

  22. Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217

    Article  Google Scholar 

  23. Lin YF, Wu CW, Huang CF, Tseng VS (2015) Discovering utility-based episode rules in complex event sequences. Expert Syst with Appl 42(12):5303–5314

    Article  Google Scholar 

  24. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management (CIKM ’12), pp 55–64

  25. Liu Y, Liao WK, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of 1st international workshop on Utility-based data mining (UBDM ’05), pp 90–99

  26. Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36

    Article  Google Scholar 

  27. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46

    Article  MATH  Google Scholar 

  28. Pears R, Koh Y S, Dobbie G, Yeap W (2013) Weighted association rule mining via a graph based connectivity model. Inf Syst 218:61–84

    MathSciNet  MATH  Google Scholar 

  29. Pisharath J, Liu Y, Liao WK, Choudhary A, Memik G, Parhi J (2005) Nu-minebench version 2.0 dataset and technical report, http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html, accessed on June 2013

  30. Ramkumar GD, Ramkumar S, Shalom T (1998) Weighted association rules: Model and algorithm. In: Proceedings of 4th ACM international conference on knowledge discovery and data mining

  31. Rymon R (1992) Search through systematic set enumeration. In: Proceedings of 3rd international conference on principles of knowledge representation and reasoning, pp 539–550

  32. Sahoo J, Das AK, Goswami A (2015) An effective association rule mining scheme using a new generic basis. Knowl Inf Syst 43(1):127–156

    Article  Google Scholar 

  33. Shie BE, Tseng VS, Yu PS (2010) Online mining of temporal maximal utility itemsets from data streams. In: Proceedings of the 2010 ACM symposium on applied computing (SAC ’10), pp 1622–1626

  34. Shie BE, Yu PS, Tseng VS (2012) Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst with Appl 39(17):12,947–12,960

    Article  Google Scholar 

  35. Shie BE, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435

    Article  Google Scholar 

  36. Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40 (1):29–43

    Article  MathSciNet  Google Scholar 

  37. Sun K, Bai F (2008) Mining weighted association rules without preassigned weights. IEEE Trans Knowl Data Eng 20:489–495

    Article  Google Scholar 

  38. Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework. In: Proceedings of the Ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03) , pp 661–666

  39. Tseng V, Wu C W, Fournier-Viger P, Yu P (2015) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27(3): 726–739

    Article  Google Scholar 

  40. Tseng VS, Wu CW, Shie BE, Yu PS (2010) Up-growth: An efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’10), pp 253–262

  41. Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8): 1772–1786

    Article  Google Scholar 

  42. Vo B, Hong TP, Le B (2012) Dbv-miner: A dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst with Appl 39(8):7196–7206

    Article  Google Scholar 

  43. Wang K, Zhou S, Han J (2002) Profit mining: From patterns to actions. In: Advances in Database Technology - EDBT 2002, LNCS, vol 2287, pp 70–87

  44. Wang W, Yang J, Yu P S (2000) Efficient mining of weighted association rules (war). In: Proceedings of the Sixth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2000), pp 270–274

  45. Wu CW, Fournier-Viger P, Yu PS, Tseng VS (2011) Efficient mining of a concise and lossless representation of high utility itemsets. In: Proceedings of the 2011 IEEE 11th international conference on data mining (ICDM ’11) , pp 824–833

  46. Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the Third SIAM international conference on data mining , pp 482–486

  47. Yun U (2007) Efficient mining of weighted interesting patterns with a strong weight and/or support affinity. Inf Sci 177(17): 3477–3499

    Article  MathSciNet  Google Scholar 

  48. Yun U (2007) Mining lossless closed frequent patterns with weight constraints. Knowl-Based Syst 20(1):86–97

    Article  Google Scholar 

  49. Yun U, Shin H, Ryu KH, Yoon EC (2012) An efficient mining algorithm for maximal weighted frequent patterns in transactional databases. Knowl-Based Syst 33:53–64

    Article  Google Scholar 

  50. Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst with Appl 41(8):3861– 3878

    Article  Google Scholar 

  51. Zaki M, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Goswami.

Appendices

Appendix A: Proof of Theorem 3

Let Y be an extension of X. Then, XY. If T i d l i s t(X) denotes the set of tid’s of g U L(X), and T i d l i s t(Y) for the set of tid’s of g U L(Y), T i d l i s t(Y) ⊆T i d l i s t(X). Thus, we have, ∀tY, \(\widetilde {Y/X}\subseteq \widetilde {t/X}\). Furthermore, we have,

$$\begin{array}{@{}rcl@{}} u(Y,t) &=& u(X,t)+u(\widetilde{Y/X},t)=u(X,t)+\underset{i\in \widetilde{Y/X}}{\sum}u(i,t)\\ &\leq& u(X,t)+\underset{i\in \widetilde{t/X}}{\sum}u(i, t)\\ &=& u(X,t)+\widetilde{ru}(X,t). \end{array} $$

Hence,

$$\begin{array}{@{}rcl@{}} u(Y) &=& \underset{t\in Tidlist(Y)}{\sum}{u(Y,t)}\\ &\leq& \underset{t\in Tidlist(Y)}{\sum}{u(X,t)+\widetilde{ru}(X,t)}< min\_util. \end{array} $$

As a result, the itemset Y is not a high utility itemset and the theorem follows.

Appendix B: Proof of Theorem 4

Let Y be an extension of X. Clearly, XY. If T i d l i s t(X) denotes the set of tid’s of g U L(X), we denote T i d l i s t(Y) is the set of tid’s of g U L(Y). Then, T i d l i s t(Y) ⊆T i d l i s t(X). Thus, the promising utility value of Y in \(\mathcal {D}\) is given by

$$\begin{array}{@{}rcl@{}} PU(Y) &=& \underset{t\in Tidlist(Y)}{\sum}(u(Y, t)+\widetilde{ru}(Y, t))\\ &=& \underset{t\in Tidlist(Y)}{\sum}(u(X, t)+u(Y\setminus X, t)+\widetilde{ru}(Y, t))\\ &&\leq \underset{t\in Tidlist(Y)}{\sum}(u(X, t)+\widetilde{ru}(X, t)),\\ && \text{since} \, POST\_SET(Y)\subseteq POST\_SET(X)\\ &&\leq \underset{t\in Tidlist(X)}{\sum}u(X, t)+\widetilde{ru}(X, t)\\ & = & PU(X). \end{array} $$

Hence, the promising utility value of the itemset Y is less or than equal to the promising utility value of X.

Appendix C: Proof of Theorem 6

We prove this theorem by the method of contradiction. If possible, let i be added to the PRE-SET of j. This means for any generator g e n = ∅ ∪j, we have to test T i d l i s t(g e n)⊆T i d l i s t(i) for the order preserving property. If T i d l i s t(g e n)⊆T i d l i s t(i), a candidate closed itemset, say Y , which is an extension of i. Then, P U(Y ) ≥ m i n_u t u l. But P U(i) ≥ P U(Y ) by Theorem 4. So, we have P U(i) ≥ m i n_u t i l, which contradicts the hypothesis that P U(i)<m i n_u t i l. Hence, the theorem follows.

Appendix D: Proof of Theorem 7

We also prove the theorem by method of contradiction. Let T i d l i s t(g e n )⊆T i d l i s t(i). Then, T i d l i s t(g e n i) = T i d l i s t(g e n ) and hence, we have P U(g e n i) = P U(g e n ). Again, Y is an extension of Y. So, the P O S T_S E T (g e n ) ⊂ P O S T_S E T (gen). Since YY , we can decompose g e n as g e n = YZi such that Z = Y Y and for all jZ, jP O S T_S E T (Y). We then have

$$\begin{array}{@{}rcl@{}} PU(gen^{\prime}) &=& PU(gen^{\prime}\cup i)\\ &=& \underset{t\in Tidlist(gen^{\prime})}{\sum}PU(gen^{\prime}\cup i, t)\\ &=& \underset{t\in Tidlist(gen^{\prime})}{\sum}(u(gen^{\prime}\cup i, t)\\ &&+\widetilde{ru}(gen^{\prime}\cup i, t))\\ &=& \underset{t\in Tidlist(gen^{\prime})}{\sum}(u(Y, t)+u(Z, t)\\ &&+u(i^{\prime}, t)+u(i, t) + \quad\widetilde{ru}(gen^{\prime}\cup i, t))\\ &&\quad\leq \underset{t\in Tidlist(gen^{\prime})}{\sum}(u(Y, t)+u(i, t)\\ &&+ \widetilde{ru}(Y\cup i, t)),\\ &&\quad\text{since} \, \textnormal { POST\_SET(\textit{Y}) \(\supseteq \)POST\_SET(\(Y^{\prime}\))}\\ &&\quad\leq \sum\limits_{t\in Tidlist(Y\cup i)}u(Y\cup i, t)+\widetilde{ru}(Y\cup i, t)\\ &=& PU(Y\cup i) = PU(gen) < min\_util. \end{array} $$

This is a contradiction to the hypothesis that P U(g e n ) ≥ m i n_u t i l. Hence, the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sahoo, J., Das, A.K. & Goswami, A. An efficient fast algorithm for discovering closed+ high utility itemsets. Appl Intell 45, 44–74 (2016). https://doi.org/10.1007/s10489-015-0740-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-015-0740-4

Keywords

Navigation