Abstract
In recent years, high utility itemsets (HUIs) mining from the transactional databases becomes one of the most emerging research topic in the field of data mining due to its wide range of applications in online e-commerce data analysis, identifying interesting patterns in biomedical data and for cross marketing solutions in retail business. It aims to discover the itemsets with high utilities efficiently by considering item quantities in a transaction and profit values of each item. However, it produces a tremendous number of HUIs, which imposes further burden in analysis of the extracted patterns and also degrades the performance of mining methods. Mining the set of closed + high utility itemsets (CHUIs) solves this issue as it is a loss-less and condensed representation of all HUIs. In this paper, we aim to present a new algorithm for finding CHUIs from a transactional database, called the CHUM (Closed + High Utility itemset Miner), which is scalable and efficient. The proposed mining algorithm adopts a tricky aimed vertical representation of the database in order to speed up the execution time in generating itemset closures and compute their utility information without accessing the database. The proposed method makes use of the item co-occurrences strategy in order to further reduce the number of intersections needed to be performed. Several experiments are conducted on various sparse and dense datasets and the simulation results clearly show the scalability and superior performance of our algorithm as compared to those for the existing state-of-the-art CHUD (Closed + High Utility itemset Discovery) algorithm.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB ’94), pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, pp 3–14
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2011) HUC-prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198
Barber B, Hamilton HJ (2003) Extracting share frequent itemsets with infrequent subsets. Data Min Knowl Disc 7(2):153– 185
Boulicaut JF, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowl Disc 7(1):5– 22
Burdick D, Calimlim M, Gehrke J (2001) Mafia: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of 17th international conference on data engineering, vol 2001, pp 443–452
Cai CH, Fu AC, Cheng C, Kwong WW (1998) Mining association rules with weighted items. In: Proceedings of international database engineering and applications symposium (IDEAS’98), vol 1998, pp 68–77
Calders T, Goethals B (2007) Non-derivable itemset mining. Data Min Knowl Disc 14(1):171–206
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: 3rd IEEE international conference on data mining (ICDM’03), 2003, pp 19–26
Chen CH, Chiang RD, Lee CM, Chen CY (2012) Improving the performance of association classifiers by rule prioritization. Knowl-Based Syst 36:59–67
Chen Y, Zhao Y, Yao Y (2007) A profit-based business model for evaluating rule interestingness. In: Advances in Artificial Intelligence, LNCS, vol 4509, pp 296– 307
Chuang KT, Huang JL, Chen MS (2008) Mining top-k frequent patterns in the presence of the memory constraint. VLDB J 17(5):1321–1344
IBM Quest Synthetic Data Generator, http://www.cs.loyola.edu/cgiannel/assoc_gen.html, accessed on August 2011
FIMI (2003) FIMI: The frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/, accessed on February 2012
Fournier-Viger P, Gomariz A, Soltani A, Gueniche T (2014) SPMF: Open-Source Data Mining Library. http://www.philippe-fournier-viger.com/spmf/ , accessed on August 2014
Fournier-Viger P, Wu C, Tseng VS (2014) Novel concise representations of high utility itemsets using generator patterns. In: Advanced Data Mining and Applications, LNCS, vol 8933 , pp 30–43
Fournier-Viger P, Wu C, Zida S, Tseng VS (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Foundations of Intelligent Systems, LNCS, vol 8502, pp 83–92
Hilderman RJ, Hamiliton HJ, Carter CL, Cercone N (1998) Mining association rules from market basket data using share measures and characterized itemsets. Int J Artif Intell Tools 7(2): 189–220
Koh Y, Pears R, Yeap W (2010) Valency based weighted association rule mining. In: Advances in Knowledge Discovery and Data Mining, LNCS, vol 6118, pp 274–285
Le T, Vo B (2015) An n-list-based algorithm for mining frequent closed patterns. Expert Syst with Appl 42(19):6648– 6657
Lee D, Park SH, Moon S (2013) Utility-based association rule mining: A marketing solution for cross-selling. Expert Syst with Appl 40(7):2715–2725
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Lin YF, Wu CW, Huang CF, Tseng VS (2015) Discovering utility-based episode rules in complex event sequences. Expert Syst with Appl 42(12):5303–5314
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management (CIKM ’12), pp 55–64
Liu Y, Liao WK, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of 1st international workshop on Utility-based data mining (UBDM ’05), pp 90–99
Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46
Pears R, Koh Y S, Dobbie G, Yeap W (2013) Weighted association rule mining via a graph based connectivity model. Inf Syst 218:61–84
Pisharath J, Liu Y, Liao WK, Choudhary A, Memik G, Parhi J (2005) Nu-minebench version 2.0 dataset and technical report, http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html, accessed on June 2013
Ramkumar GD, Ramkumar S, Shalom T (1998) Weighted association rules: Model and algorithm. In: Proceedings of 4th ACM international conference on knowledge discovery and data mining
Rymon R (1992) Search through systematic set enumeration. In: Proceedings of 3rd international conference on principles of knowledge representation and reasoning, pp 539–550
Sahoo J, Das AK, Goswami A (2015) An effective association rule mining scheme using a new generic basis. Knowl Inf Syst 43(1):127–156
Shie BE, Tseng VS, Yu PS (2010) Online mining of temporal maximal utility itemsets from data streams. In: Proceedings of the 2010 ACM symposium on applied computing (SAC ’10), pp 1622–1626
Shie BE, Yu PS, Tseng VS (2012) Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst with Appl 39(17):12,947–12,960
Shie BE, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435
Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40 (1):29–43
Sun K, Bai F (2008) Mining weighted association rules without preassigned weights. IEEE Trans Knowl Data Eng 20:489–495
Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework. In: Proceedings of the Ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03) , pp 661–666
Tseng V, Wu C W, Fournier-Viger P, Yu P (2015) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27(3): 726–739
Tseng VS, Wu CW, Shie BE, Yu PS (2010) Up-growth: An efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’10), pp 253–262
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8): 1772–1786
Vo B, Hong TP, Le B (2012) Dbv-miner: A dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst with Appl 39(8):7196–7206
Wang K, Zhou S, Han J (2002) Profit mining: From patterns to actions. In: Advances in Database Technology - EDBT 2002, LNCS, vol 2287, pp 70–87
Wang W, Yang J, Yu P S (2000) Efficient mining of weighted association rules (war). In: Proceedings of the Sixth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2000), pp 270–274
Wu CW, Fournier-Viger P, Yu PS, Tseng VS (2011) Efficient mining of a concise and lossless representation of high utility itemsets. In: Proceedings of the 2011 IEEE 11th international conference on data mining (ICDM ’11) , pp 824–833
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the Third SIAM international conference on data mining , pp 482–486
Yun U (2007) Efficient mining of weighted interesting patterns with a strong weight and/or support affinity. Inf Sci 177(17): 3477–3499
Yun U (2007) Mining lossless closed frequent patterns with weight constraints. Knowl-Based Syst 20(1):86–97
Yun U, Shin H, Ryu KH, Yoon EC (2012) An efficient mining algorithm for maximal weighted frequent patterns in transactional databases. Knowl-Based Syst 33:53–64
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst with Appl 41(8):3861– 3878
Zaki M, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Proof of Theorem 3
Let Y be an extension of X. Then, X ⊂ Y. If T i d l i s t(X) denotes the set of tid’s of g U L(X), and T i d l i s t(Y) for the set of tid’s of g U L(Y), T i d l i s t(Y) ⊆T i d l i s t(X). Thus, we have, ∀t⊇Y, \(\widetilde {Y/X}\subseteq \widetilde {t/X}\). Furthermore, we have,
Hence,
As a result, the itemset Y is not a high utility itemset and the theorem follows.
Appendix B: Proof of Theorem 4
Let Y be an extension of X. Clearly, X ⊂ Y. If T i d l i s t(X) denotes the set of tid’s of g U L(X), we denote T i d l i s t(Y) is the set of tid’s of g U L(Y). Then, T i d l i s t(Y) ⊆T i d l i s t(X). Thus, the promising utility value of Y in \(\mathcal {D}\) is given by
Hence, the promising utility value of the itemset Y is less or than equal to the promising utility value of X.
Appendix C: Proof of Theorem 6
We prove this theorem by the method of contradiction. If possible, let i be added to the PRE-SET of j. This means for any generator g e n = ∅ ∪j, we have to test T i d l i s t(g e n)⊆T i d l i s t(i) for the order preserving property. If T i d l i s t(g e n)⊆T i d l i s t(i), a candidate closed itemset, say Y ′, which is an extension of i. Then, P U(Y ′) ≥ m i n_u t u l. But P U(i) ≥ P U(Y ′) by Theorem 4. So, we have P U(i) ≥ m i n_u t i l, which contradicts the hypothesis that P U(i)<m i n_u t i l. Hence, the theorem follows.
Appendix D: Proof of Theorem 7
We also prove the theorem by method of contradiction. Let T i d l i s t(g e n ′)⊆T i d l i s t(i). Then, T i d l i s t(g e n ′∪i) = T i d l i s t(g e n ′) and hence, we have P U(g e n ′∪i) = P U(g e n ′). Again, Y ′ is an extension of Y. So, the P O S T_S E T (g e n ′) ⊂ P O S T_S E T (gen). Since Y⊆Y ′, we can decompose g e n ′ as g e n ′ = Y∪Z∪i ′ such that Z = Y ′∖Y and for all j∈Z, j∈ P O S T_S E T (Y). We then have
This is a contradiction to the hypothesis that P U(g e n ′) ≥ m i n_u t i l. Hence, the proof.
Rights and permissions
About this article
Cite this article
Sahoo, J., Das, A.K. & Goswami, A. An efficient fast algorithm for discovering closed+ high utility itemsets. Appl Intell 45, 44–74 (2016). https://doi.org/10.1007/s10489-015-0740-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-015-0740-4