Abstract
Mining itemsets for association rule generation is a fundamental data mining task originally stemming from the traditional market basket analysis problem. However, enumerating all frequent itemsets, especially in a dense dataset, or with low support thresholds, remains costly. In this paper, a novel theorem builds the relationship between frequent closed itemsets (FCIs) and frequent generator itemsets (FGIs) and proves that the process of mining FCIs is equivalent to mining FGIs, unified with their full-support and extension items. On the basis of this theorem, a generator-based algorithm for mining FCIs, called GrAFCI+, is proposed and explained in details including its correctness. The comparative effectiveness of the algorithm in terms of scalability is first investigated, along with the compression rate—a measure of the interestingness of a given FIs representation. Extensive experiments are further undertaken on eight datasets and four state-of-the-art algorithms, namely DCI_CLOSED*, DCI_PLUS, FPClose, and NAFCP. The results show that the proposed algorithm is more efficient regarding the execution time in most cases as compared to these algorithms. Because GrAFCI+ main goal is to address the runtime issue, it paid a memory cost, especially when the support is too small. However, this cost is not high since GrAFCI+ is seconded by only one competitor out of four in memory utilization and for large support values. As an overall assessment, GrAFCI+ gives better results than most of its competitors.
Similar content being viewed by others
Notes
The actual url is http://fimi.uantwerpen.be/data/
Precise calculations can readily be done using formula 9 and data from Table 8. For example, for Accidents dataset, with support \(20\%\), \(CR(FC) = (887 441)/(889 936) = 99.71\%\), which is an extremely poor compression, meaning that we need \(99.7\%\) of FIs to represent \(100\%\) of FIs. For the Connect dataset, and \(30\%\) support, \(CR(FC) = (460 412)/(1 331 880 801) = 0.3\%\), which is an excellent compression rate, meaning that only \(0.3\%\) of FIs are needed in memory to represent \(100\%\) of FIs.
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on very large data bases (VLDB ’94), Morgan Kaufmann Publishers Inc., pp 487–499
Alves R, Rodríguez-Baena DS, Aguilar-Ruiz JS (2010) Gene association analysis: a survey of frequent pattern mining from gene expression data. Briefings Bioinform 11(2):210–224
Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) MAFIA: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504
Deng Z, Lv S (2015) Prepost+: an efficient n-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Syst Appl 42(13):5424–5432
Deng Z, Wang Z (2010) A new fast vertical method for mining frequent patterns. Int J Comput Intell Syst 3(6):733–744
Deng Z, Wang Z, Jiang J (2012) A new algorithm for fast mining frequent itemsets using n-lists. Sci China Inform Sci 55(9):2008–2030
Djenouri Y, Djenouri D, Belhadi A, Fournier-Viger P, Lin JCW (2018) A new framework for metaheuristic-based frequent itemset mining. Appl Intell 48(12):4775–4791
Dong G, Feng M, Son NT, Lee TS, Li J, Liu G, Wong L (2002) pattern space projects. https://www.comp.nus.edu.sg/~wongls/projects/pattern-spaces/
FIMI (2003) Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/
Fournier-Viger P, Lin JC, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: LNCS, vol 9853, pp 36–40
Goethals B, Zaki MJ (2004) Advances in frequent itemset mining implementations: Report on FIMI03. SIGKDD Explorat 6(1):109–117
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29(2):1–12
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Disc 8(1):53–87
Han J, Kamber M, Pe J (2011) Data Mining: Concepts and Techniques, chap 6, 3rd edn. Morgan Kaufmann Publishers, Burlington, pp 243–278
Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: Proceedings 2001 IEEE international conference on data mining, pp 305–312
Le T, Vo B (2015) An n-list-based algorithm for mining frequent closed patterns. Expert Syst Appl 42(19):6648–6657
Li J, Li H, Wong L, Pei J, Dong G (2006) Minimum description length principle: generators are preferable to closed patterns. In: Proceedings of the 21st national conference on artificial intelligence - Volume 1, AAAI Press, pp 409–414
Liu G, Li J, Wong L (2008) A new concise representation of frequent itemsets using generators and a positive border. Knowl Inf Syst 17(1):35–56
Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36
Nam H, Yun U, Yoon E, Lin JCW (2020) Efficient approach for incremental weighted erasable pattern mining with list structure. Expert Syst Appl 143:113087
Pan F, Cong G, Tung AKH, Yang J, Zaki MJ (2003) Carpenter: Finding closed patterns in long biological datasets. In: Proceedings of the 9th ACM SIGKDD conference, pp 637—-642
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: workshop on research issues in data mining and knowledge discovery, pp 21–30
Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6(5):570–594
Sahoo J, Ashok KD, Goswami A (2015) An effective association rule mining scheme using a new generic basis. Knowl Inf Syst 43(1):127–156
Sun J, Xun Y, Zhang J, Li J (2019) Incremental frequent itemsets mining with FCFP tree. IEEE Access 7:136511–136524
Vo B, Hong TP, Le B (2012) DBV-Miner: a dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst Appl 39(8):7196–7206
Vo B, Le T, Coenen F, Hong T (2016) Mining frequent itemsets using the n-list and subsume concepts. Int J Mach Learn Cyber 7(2):253–265
Vo B, Pham S, Le T, Deng Z (2017) A novel approach for mining maximal frequent patterns. Expert Syst Appl 73:178–186
Wang J, Han J, Pei J (2003) Closet+: Searching for the best strategies for mining frequent closed itemsets. In: Proc of the 9th ACM SIGKDD conference, pp 236–245
Xu Y, Li Y (2007) Generating concise association rules. In: Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM ’07), pp 781–790
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Zaki MJ, Hsiao C (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Zhang C, Tian P, Zhang X, Liao Q, Jiang ZL, Wang X (2019) HashEclat: an efficient frequent itemset algorithm. Int J Mach Learn Cyber 10:3003–3016
Acknowledgements
The authors would like to thank the Editor and the Editor-in-Chief and the anonymous reviewers for their constructive comments, pointing to directions of research and additional experimental work that greatly improved the manuscript. The authors would also like to thank Tuong Le and Bay Vo for providing the source code for NAFCP and Sahoo Jayakrushna for the source code for DCI_PLUS.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ledmi, M., Zidat, S. & Hamdi-Cherif, A. GrAFCI+ A fast generator-based algorithm for mining frequent closed itemsets. Knowl Inf Syst 63, 1873–1908 (2021). https://doi.org/10.1007/s10115-021-01575-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01575-3