Abstract
When mining frequent itemsets (abbr. FIs) from dense datasets, it usually produces too many itemsets and results in the mining task to suffer from a very long execution time and high memory consumption. Frequent closed itemset (abbr. FCI) is a compact and lossless representation of FI. Mining FCIs can not only reduce the execution time and memory usage, but also reserve the complete information of FIs derived from FCIs. Although many studies have been proposed with various efficient methods for mining FCIs, few of them have developed algorithms for efficiently deriving FIs from FCIs. In this work, we propose two efficient algorithms named DFI-List and DFI-Growth for efficiently deriving FIs from FCIs. The both algorithms adopt depth-first search and divide-and-conquer methodology to derive all the FIs. DFI-List efficiently derives all the FIs with a vertical index structure called Cid List. DFI-Growth compresses the information of FCIs into tree structures and applies pattern-growth strategy to derive FIs from the trees. Empirical experiments show that DFI-List is the most efficient and scalable algorithm on the dense datasets. For example, when the minimum support threshold is set to 50% on the Chess dataset, DFI-List runs faster than LevelWise (Pasquier et al. Inf Syst 24(1): 25-46, 1999b) over 100 times. As for DFI-Growth, it is the most stable and memory efficient algorithm on the sparse datasets. Both DFI-Growth and DFI-List are superior to the state-of-the-art algorithm (Pasquier et al. Inf Syst 24(1): 25-46, 199b) in terms of execution time.
Similar content being viewed by others
References
Agrawal R, Srikant R, et al. (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, vol 1215. Citeseer, pp 487–499
Aryabarzan N, Minaei-Bidgoli B, Teshnehlab M (2018) negfin: An efficient algorithm for fast mining frequent itemsets. Expert Systems with Applications 105:129–143
Boulicaut JF, Bykowski A, Rigotti C (2000) Approximation of frequency queries by means of free-sets. In: European conference on principles of data mining and knowledge discovery. Springer, pp 75–85
Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: European conference on principles of data mining and knowledge discovery. Springer, pp 74–86
Deng ZH (2016) Diffnodesets: An efficient structure for fast mining frequent itemsets. Appl Soft Comput 41:214–223
El-Hajj M, Zaiane OR (2003) Cofi-tree mining: a new approach to pattern growth with reduced candidacy generation. In: Workshop on frequent itemset mining implementations (FIMI’03) in conjunction with IEEE-ICDM
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS, et al. (2014) Spmf: A java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393
Gouda K, Zaki MJ (2005) Genmax: An efficient algorithm for mining maximal frequent itemsets. Data Min Knowl Disc 11(3):223–242
Gupta S, Mamtora R (2014) A survey on association rule mining in market basket analysis. Int J Inf Comput Technol 4(4):409–414
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM sigmod record 29(2):1–12
Huang J, Lai YP, Lo C, Wu CW (2019) An efficient algorithm for deriving frequent itemsets from lossless condensed representation. In: International conference on industrial, engineering and other applications of applied intelligent systems. Springer, pp 216–229
Kim D, Yun U (2016) Efficient mining of high utility pattern with considering of rarity and length. Appl Intell 45(1):152–173
Kim D, Yun U (2017) Efficient algorithm for mining high average-utility itemsets in incremental transaction databases. Appl Intell 47(1):114–131
Le T, Vo B (2015) An n-list-based algorithm for mining frequent closed patterns. Expert Syst Appl 42(19):6648–6657
Lee G, Yun U (2017) A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives. Futur Gener Comput Syst 68:89–110
Lee G, Yun U, Ryang H, Kim D (2016) Approximate maximal frequent pattern mining with weight conditions and error tolerance. Int J Pattern Recogn Artif Intell 30(06):1650012
Liu J, Shang J, Wang C, Ren X, Han J (2015) Mining quality phrases from massive text corpora. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 1729–1744
Lucchese C, Orlando S, Perego R (2005) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36
Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. Acm sigmod record 24(2):175–186
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: International conference on database theory. Springer, pp 398–416
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: Hyper-structure mining of frequent patterns in large databases. In: Proceedings 2001 IEEE international conference on data mining. IEEE, pp 441–448
Prabha S, Shanmugapriya S, Duraiswamy K (2013) A survey on closed frequent pattern mining. Int J Comput Appl 63(14)
Ryang H, Yun U (2017) Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques. Knowl Inf Syst 51(2):627–659
Ting S, Shum C, Kwok SK, Tsang AH, Lee WB, et al. (2009) Data mining in biomedicine: Current applications and further directions for research. J Softw Eng Appl 2(03):150
Wang J, Han J, Pei J (2003) Closet+ searching for the best strategies for mining frequent closed itemsets. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 236–245
Yun U, Lee G, Yoon E (2017) Efficient high utility pattern mining for establishing manufacturing plans with sliding window control. IEEE Trans Ind Electron 64(9):7239–7249
Yun U, Ryang H, Lee G, Fujita H (2017) An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowl-Based Syst 124:188–206
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12 (3):372–390
Zaki MJ, Hsiao CJ (2002) Charm: An efficient algorithm for closed itemset mining. In: Proceedings of the 2002 SIAM international conference on data mining. SIAM, pp 457–473
Zhang Q, Segall RS (2008) Web mining: A survey of current research, techniques, and software. Int J Inf Technol Decision Making 7(04):683–720
Source code of the implemented dfi-growth algorithm released in spmf. http://www.philippe-fournier-viger.com/spmf/DFI-Growth.php
Source code of the implemented dfi-list algorithm released in spmf. http://www.philippe-fournier-viger.com/spmf/DFI-List.php
Source code of the implemented levelwise algorithm released in spmf. http://www.philippe-fournier-viger.com/spmf/LevelWise.php
Acknowledgments
This work is partially supported by Ministry of Science and Technology, Taiwan, under Grant No. 109-2221-E-197-027 and 109-2634-F-009-026 through Pervasive Artificial Intelligence Research (PAIR) Labs.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special issue on Artificial intelligence in practice - from theory to application
Guest Editors: Franz Wotawa, Gerhard Friedrich and Ingo Pill
Rights and permissions
About this article
Cite this article
Wu, CW., Huang, J., Lin, YW. et al. Efficient algorithms for deriving complete frequent itemsets from frequent closed itemsets. Appl Intell 52, 7002–7023 (2022). https://doi.org/10.1007/s10489-020-02172-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02172-7