An efficient fast algorithm for discovering closed+ high utility itemsets

Sahoo, Jayakrushna; Das, Ashok Kumar; Goswami, A.

doi:10.1007/s10489-015-0740-4

An efficient fast algorithm for discovering closed⁺ high utility itemsets

Published: 25 January 2016

Volume 45, pages 44–74, (2016)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

538 Accesses
Explore all metrics

Abstract

In recent years, high utility itemsets (HUIs) mining from the transactional databases becomes one of the most emerging research topic in the field of data mining due to its wide range of applications in online e-commerce data analysis, identifying interesting patterns in biomedical data and for cross marketing solutions in retail business. It aims to discover the itemsets with high utilities efficiently by considering item quantities in a transaction and profit values of each item. However, it produces a tremendous number of HUIs, which imposes further burden in analysis of the extracted patterns and also degrades the performance of mining methods. Mining the set of closed ⁺ high utility itemsets (CHUIs) solves this issue as it is a loss-less and condensed representation of all HUIs. In this paper, we aim to present a new algorithm for finding CHUIs from a transactional database, called the CHUM (Closed ⁺ High Utility itemset Miner), which is scalable and efficient. The proposed mining algorithm adopts a tricky aimed vertical representation of the database in order to speed up the execution time in generating itemset closures and compute their utility information without accessing the database. The proposed method makes use of the item co-occurrences strategy in order to further reduce the number of intersections needed to be performed. Several experiments are conducted on various sparse and dense datasets and the simulation results clearly show the scalability and superior performance of our algorithm as compared to those for the existing state-of-the-art CHUD (Closed ⁺ High Utility itemset Discovery) algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CLS-Miner: efficient and effective closed high-utility itemset mining

Article 11 April 2019

MMC: efficient and effective closed high-utility itemset mining

Article 24 May 2024

High utility itemsets mining from transactional databases: a survey

Article 16 September 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB ’94), pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, pp 3–14
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2011) HUC-prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198
Article Google Scholar
Barber B, Hamilton HJ (2003) Extracting share frequent itemsets with infrequent subsets. Data Min Knowl Disc 7(2):153– 185
Article MathSciNet Google Scholar
Boulicaut JF, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowl Disc 7(1):5– 22
Article MathSciNet Google Scholar
Burdick D, Calimlim M, Gehrke J (2001) Mafia: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of 17th international conference on data engineering, vol 2001, pp 443–452
Cai CH, Fu AC, Cheng C, Kwong WW (1998) Mining association rules with weighted items. In: Proceedings of international database engineering and applications symposium (IDEAS’98), vol 1998, pp 68–77
Calders T, Goethals B (2007) Non-derivable itemset mining. Data Min Knowl Disc 14(1):171–206
Article MathSciNet Google Scholar
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: 3rd IEEE international conference on data mining (ICDM’03), 2003, pp 19–26
Chen CH, Chiang RD, Lee CM, Chen CY (2012) Improving the performance of association classifiers by rule prioritization. Knowl-Based Syst 36:59–67
Article Google Scholar
Chen Y, Zhao Y, Yao Y (2007) A profit-based business model for evaluating rule interestingness. In: Advances in Artificial Intelligence, LNCS, vol 4509, pp 296– 307
Chuang KT, Huang JL, Chen MS (2008) Mining top-k frequent patterns in the presence of the memory constraint. VLDB J 17(5):1321–1344
Article Google Scholar
IBM Quest Synthetic Data Generator, http://www.cs.loyola.edu/cgiannel/assoc_gen.html, accessed on August 2011
FIMI (2003) FIMI: The frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/, accessed on February 2012
Fournier-Viger P, Gomariz A, Soltani A, Gueniche T (2014) SPMF: Open-Source Data Mining Library. http://www.philippe-fournier-viger.com/spmf/ , accessed on August 2014
Fournier-Viger P, Wu C, Tseng VS (2014) Novel concise representations of high utility itemsets using generator patterns. In: Advanced Data Mining and Applications, LNCS, vol 8933 , pp 30–43
Fournier-Viger P, Wu C, Zida S, Tseng VS (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Foundations of Intelligent Systems, LNCS, vol 8502, pp 83–92
Hilderman RJ, Hamiliton HJ, Carter CL, Cercone N (1998) Mining association rules from market basket data using share measures and characterized itemsets. Int J Artif Intell Tools 7(2): 189–220
Article Google Scholar
Koh Y, Pears R, Yeap W (2010) Valency based weighted association rule mining. In: Advances in Knowledge Discovery and Data Mining, LNCS, vol 6118, pp 274–285
Le T, Vo B (2015) An n-list-based algorithm for mining frequent closed patterns. Expert Syst with Appl 42(19):6648– 6657
Article Google Scholar
Lee D, Park SH, Moon S (2013) Utility-based association rule mining: A marketing solution for cross-selling. Expert Syst with Appl 40(7):2715–2725
Article Google Scholar
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Article Google Scholar
Lin YF, Wu CW, Huang CF, Tseng VS (2015) Discovering utility-based episode rules in complex event sequences. Expert Syst with Appl 42(12):5303–5314
Article Google Scholar
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management (CIKM ’12), pp 55–64
Liu Y, Liao WK, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of 1st international workshop on Utility-based data mining (UBDM ’05), pp 90–99
Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36
Article Google Scholar
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46
Article MATH Google Scholar
Pears R, Koh Y S, Dobbie G, Yeap W (2013) Weighted association rule mining via a graph based connectivity model. Inf Syst 218:61–84
MathSciNet MATH Google Scholar
Pisharath J, Liu Y, Liao WK, Choudhary A, Memik G, Parhi J (2005) Nu-minebench version 2.0 dataset and technical report, http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html, accessed on June 2013
Ramkumar GD, Ramkumar S, Shalom T (1998) Weighted association rules: Model and algorithm. In: Proceedings of 4th ACM international conference on knowledge discovery and data mining
Rymon R (1992) Search through systematic set enumeration. In: Proceedings of 3rd international conference on principles of knowledge representation and reasoning, pp 539–550
Sahoo J, Das AK, Goswami A (2015) An effective association rule mining scheme using a new generic basis. Knowl Inf Syst 43(1):127–156
Article Google Scholar
Shie BE, Tseng VS, Yu PS (2010) Online mining of temporal maximal utility itemsets from data streams. In: Proceedings of the 2010 ACM symposium on applied computing (SAC ’10), pp 1622–1626
Shie BE, Yu PS, Tseng VS (2012) Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst with Appl 39(17):12,947–12,960
Article Google Scholar
Shie BE, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435
Article Google Scholar
Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40 (1):29–43
Article MathSciNet Google Scholar
Sun K, Bai F (2008) Mining weighted association rules without preassigned weights. IEEE Trans Knowl Data Eng 20:489–495
Article Google Scholar
Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework. In: Proceedings of the Ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03) , pp 661–666
Tseng V, Wu C W, Fournier-Viger P, Yu P (2015) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27(3): 726–739
Article Google Scholar
Tseng VS, Wu CW, Shie BE, Yu PS (2010) Up-growth: An efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’10), pp 253–262
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8): 1772–1786
Article Google Scholar
Vo B, Hong TP, Le B (2012) Dbv-miner: A dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst with Appl 39(8):7196–7206
Article Google Scholar
Wang K, Zhou S, Han J (2002) Profit mining: From patterns to actions. In: Advances in Database Technology - EDBT 2002, LNCS, vol 2287, pp 70–87
Wang W, Yang J, Yu P S (2000) Efficient mining of weighted association rules (war). In: Proceedings of the Sixth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2000), pp 270–274
Wu CW, Fournier-Viger P, Yu PS, Tseng VS (2011) Efficient mining of a concise and lossless representation of high utility itemsets. In: Proceedings of the 2011 IEEE 11th international conference on data mining (ICDM ’11) , pp 824–833
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the Third SIAM international conference on data mining , pp 482–486
Yun U (2007) Efficient mining of weighted interesting patterns with a strong weight and/or support affinity. Inf Sci 177(17): 3477–3499
Article MathSciNet Google Scholar
Yun U (2007) Mining lossless closed frequent patterns with weight constraints. Knowl-Based Syst 20(1):86–97
Article Google Scholar
Yun U, Shin H, Ryu KH, Yoon EC (2012) An efficient mining algorithm for maximal weighted frequent patterns in transactional databases. Knowl-Based Syst 33:53–64
Article Google Scholar
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst with Appl 41(8):3861– 3878
Article Google Scholar
Zaki M, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Indian Institute of Technology, Kharagpur, 721 302, India
Jayakrushna Sahoo & A. Goswami
Center for Security, Theory and Algorithmic Research, International Institute of Information Technology, Hyderabad, 500 032, India
Ashok Kumar Das

Authors

Jayakrushna Sahoo
View author publications
You can also search for this author inPubMed Google Scholar
Ashok Kumar Das
View author publications
You can also search for this author inPubMed Google Scholar
A. Goswami
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to A. Goswami.

Appendices

Appendix A: Proof of Theorem 3

Let Y be an extension of X. Then, X ⊂ Y. If T i d l i s t(X) denotes the set of tid’s of g U L(X), and T i d l i s t(Y) for the set of tid’s of g U L(Y), T i d l i s t(Y) ⊆T i d l i s t(X). Thus, we have, ∀t⊇Y, $\widetilde {Y/X}\subseteq \widetilde {t/X}$. Furthermore, we have,

$$\begin{array}{@{}rcl@{}} u(Y,t) &=& u(X,t)+u(\widetilde{Y/X},t)=u(X,t)+\underset{i\in \widetilde{Y/X}}{\sum}u(i,t)\\ &\leq& u(X,t)+\underset{i\in \widetilde{t/X}}{\sum}u(i, t)\\ &=& u(X,t)+\widetilde{ru}(X,t). \end{array} $$

Hence,

$$\begin{array}{@{}rcl@{}} u(Y) &=& \underset{t\in Tidlist(Y)}{\sum}{u(Y,t)}\\ &\leq& \underset{t\in Tidlist(Y)}{\sum}{u(X,t)+\widetilde{ru}(X,t)}< min\_util. \end{array} $$

As a result, the itemset Y is not a high utility itemset and the theorem follows.

Appendix B: Proof of Theorem 4

Let Y be an extension of X. Clearly, X ⊂ Y. If T i d l i s t(X) denotes the set of tid’s of g U L(X), we denote T i d l i s t(Y) is the set of tid’s of g U L(Y). Then, T i d l i s t(Y) ⊆T i d l i s t(X). Thus, the promising utility value of Y in $\mathcal {D}$ is given by

$$\begin{array}{@{}rcl@{}} PU(Y) &=& \underset{t\in Tidlist(Y)}{\sum}(u(Y, t)+\widetilde{ru}(Y, t))\\ &=& \underset{t\in Tidlist(Y)}{\sum}(u(X, t)+u(Y\setminus X, t)+\widetilde{ru}(Y, t))\\ &&\leq \underset{t\in Tidlist(Y)}{\sum}(u(X, t)+\widetilde{ru}(X, t)),\\ && \text{since} \, POST\_SET(Y)\subseteq POST\_SET(X)\\ &&\leq \underset{t\in Tidlist(X)}{\sum}u(X, t)+\widetilde{ru}(X, t)\\ & = & PU(X). \end{array} $$

Hence, the promising utility value of the itemset Y is less or than equal to the promising utility value of X.

Appendix C: Proof of Theorem 6

We prove this theorem by the method of contradiction. If possible, let i be added to the PRE-SET of j. This means for any generator g e n = ∅ ∪j, we have to test T i d l i s t(g e n)⊆T i d l i s t(i) for the order preserving property. If T i d l i s t(g e n)⊆T i d l i s t(i), a candidate closed itemset, say Y ^′, which is an extension of i. Then, P U(Y ^′) ≥ m i n_u t u l. But P U(i) ≥ P U(Y ^′) by Theorem 4. So, we have P U(i) ≥ m i n_u t i l, which contradicts the hypothesis that P U(i)<m i n_u t i l. Hence, the theorem follows.

Appendix D: Proof of Theorem 7

We also prove the theorem by method of contradiction. Let T i d l i s t(g e n ^′)⊆T i d l i s t(i). Then, T i d l i s t(g e n ^′∪i) = T i d l i s t(g e n ^′) and hence, we have P U(g e n ^′∪i) = P U(g e n ^′). Again, Y ^′ is an extension of Y. So, the P O S T_S E T (g e n ^′) ⊂ P O S T_S E T (gen). Since Y⊆Y ^′, we can decompose g e n ^′ as g e n ^′ = Y∪Z∪i ^′ such that Z = Y ^′∖Y and for all j∈Z, j∈ P O S T_S E T (Y). We then have

$$\begin{array}{@{}rcl@{}} PU(gen^{\prime}) &=& PU(gen^{\prime}\cup i)\\ &=& \underset{t\in Tidlist(gen^{\prime})}{\sum}PU(gen^{\prime}\cup i, t)\\ &=& \underset{t\in Tidlist(gen^{\prime})}{\sum}(u(gen^{\prime}\cup i, t)\\ &&+\widetilde{ru}(gen^{\prime}\cup i, t))\\ &=& \underset{t\in Tidlist(gen^{\prime})}{\sum}(u(Y, t)+u(Z, t)\\ &&+u(i^{\prime}, t)+u(i, t) + \quad\widetilde{ru}(gen^{\prime}\cup i, t))\\ &&\quad\leq \underset{t\in Tidlist(gen^{\prime})}{\sum}(u(Y, t)+u(i, t)\\ &&+ \widetilde{ru}(Y\cup i, t)),\\ &&\quad\text{since} \, \textnormal { POST\_SET(\textit{Y}) $\supseteq $POST\_SET($Y^{\prime}$)}\\ &&\quad\leq \sum\limits_{t\in Tidlist(Y\cup i)}u(Y\cup i, t)+\widetilde{ru}(Y\cup i, t)\\ &=& PU(Y\cup i) = PU(gen) < min\_util. \end{array} $$

This is a contradiction to the hypothesis that P U(g e n ^′) ≥ m i n_u t i l. Hence, the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sahoo, J., Das, A.K. & Goswami, A. An efficient fast algorithm for discovering closed⁺ high utility itemsets. Appl Intell 45, 44–74 (2016). https://doi.org/10.1007/s10489-015-0740-4

Download citation

Published: 25 January 2016
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10489-015-0740-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient fast algorithm for discovering closed+ high utility itemsets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CLS-Miner: efficient and effective closed high-utility itemset mining

MMC: efficient and effective closed high-utility itemset mining

High utility itemsets mining from transactional databases: a survey

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Proof of Theorem 3

Appendix B: Proof of Theorem 4

Appendix C: Proof of Theorem 6

Appendix D: Proof of Theorem 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

An efficient fast algorithm for discovering closed⁺ high utility itemsets