GrAFCI+ A fast generator-based algorithm for mining frequent closed itemsets

Ledmi, Makhlouf; Zidat, Samir; Hamdi-Cherif, Aboubekeur

doi:10.1007/s10115-021-01575-3

GrAFCI⁺ A fast generator-based algorithm for mining frequent closed itemsets

Regular Paper
Published: 18 May 2021

Volume 63, pages 1873–1908, (2021)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

306 Accesses
2 Citations
Explore all metrics

Abstract

Mining itemsets for association rule generation is a fundamental data mining task originally stemming from the traditional market basket analysis problem. However, enumerating all frequent itemsets, especially in a dense dataset, or with low support thresholds, remains costly. In this paper, a novel theorem builds the relationship between frequent closed itemsets (FCIs) and frequent generator itemsets (FGIs) and proves that the process of mining FCIs is equivalent to mining FGIs, unified with their full-support and extension items. On the basis of this theorem, a generator-based algorithm for mining FCIs, called GrAFCI⁺, is proposed and explained in details including its correctness. The comparative effectiveness of the algorithm in terms of scalability is first investigated, along with the compression rate—a measure of the interestingness of a given FIs representation. Extensive experiments are further undertaken on eight datasets and four state-of-the-art algorithms, namely DCI_CLOSED*, DCI_PLUS, FPClose, and NAFCP. The results show that the proposed algorithm is more efficient regarding the execution time in most cases as compared to these algorithms. Because GrAFCI⁺ main goal is to address the runtime issue, it paid a memory cost, especially when the support is too small. However, this cost is not high since GrAFCI⁺ is seconded by only one competitor out of four in memory utilization and for large support values. As an overall assessment, GrAFCI⁺ gives better results than most of its competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient algorithms for deriving complete frequent itemsets from frequent closed itemsets

Article 11 April 2021

An Efficient Algorithm for Deriving Frequent Itemsets from Lossless Condensed Representation

A Compact Data Structure Based Technique for Mining Frequent Closed Item Sets

Notes

https://www.philippe-fournier-viger.com/spmf/index.php?link=algorithms.php
https://www.comp.nus.edu.sg/~wongls/projects/pattern-spaces/
https://github.com/rionda/truefis/tree/master/code/grahne
The actual url is http://fimi.uantwerpen.be/data/
Precise calculations can readily be done using formula 9 and data from Table 8. For example, for Accidents dataset, with support \(20\%\), \(CR(FC) = (887 441)/(889 936) = 99.71\%\), which is an extremely poor compression, meaning that we need \(99.7\%\) of FIs to represent \(100\%\) of FIs. For the Connect dataset, and \(30\%\) support, \(CR(FC) = (460 412)/(1 331 880 801) = 0.3\%\), which is an excellent compression rate, meaning that only \(0.3\%\) of FIs are needed in memory to represent \(100\%\) of FIs.

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on very large data bases (VLDB ’94), Morgan Kaufmann Publishers Inc., pp 487–499
Alves R, Rodríguez-Baena DS, Aguilar-Ruiz JS (2010) Gene association analysis: a survey of frequent pattern mining from gene expression data. Briefings Bioinform 11(2):210–224
Article Google Scholar
Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) MAFIA: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504
Article Google Scholar
Deng Z, Lv S (2015) Prepost+: an efficient n-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Syst Appl 42(13):5424–5432
Article Google Scholar
Deng Z, Wang Z (2010) A new fast vertical method for mining frequent patterns. Int J Comput Intell Syst 3(6):733–744
Article Google Scholar
Deng Z, Wang Z, Jiang J (2012) A new algorithm for fast mining frequent itemsets using n-lists. Sci China Inform Sci 55(9):2008–2030
Article MathSciNet Google Scholar
Djenouri Y, Djenouri D, Belhadi A, Fournier-Viger P, Lin JCW (2018) A new framework for metaheuristic-based frequent itemset mining. Appl Intell 48(12):4775–4791
Article Google Scholar
Dong G, Feng M, Son NT, Lee TS, Li J, Liu G, Wong L (2002) pattern space projects. https://www.comp.nus.edu.sg/~wongls/projects/pattern-spaces/
FIMI (2003) Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/
Fournier-Viger P, Lin JC, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: LNCS, vol 9853, pp 36–40
Goethals B, Zaki MJ (2004) Advances in frequent itemset mining implementations: Report on FIMI03. SIGKDD Explorat 6(1):109–117
Article Google Scholar
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Article Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29(2):1–12
Article Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Disc 8(1):53–87
Article MathSciNet Google Scholar
Han J, Kamber M, Pe J (2011) Data Mining: Concepts and Techniques, chap 6, 3rd edn. Morgan Kaufmann Publishers, Burlington, pp 243–278
Google Scholar
Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: Proceedings 2001 IEEE international conference on data mining, pp 305–312
Le T, Vo B (2015) An n-list-based algorithm for mining frequent closed patterns. Expert Syst Appl 42(19):6648–6657
Article Google Scholar
Li J, Li H, Wong L, Pei J, Dong G (2006) Minimum description length principle: generators are preferable to closed patterns. In: Proceedings of the 21st national conference on artificial intelligence - Volume 1, AAAI Press, pp 409–414
Liu G, Li J, Wong L (2008) A new concise representation of frequent itemsets using generators and a positive border. Knowl Inf Syst 17(1):35–56
Article Google Scholar
Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36
Article Google Scholar
Nam H, Yun U, Yoon E, Lin JCW (2020) Efficient approach for incremental weighted erasable pattern mining with list structure. Expert Syst Appl 143:113087
Article Google Scholar
Pan F, Cong G, Tung AKH, Yang J, Zaki MJ (2003) Carpenter: Finding closed patterns in long biological datasets. In: Proceedings of the 9th ACM SIGKDD conference, pp 637—-642
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46
Article Google Scholar
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: workshop on research issues in data mining and knowledge discovery, pp 21–30
Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6(5):570–594
Article Google Scholar
Sahoo J, Ashok KD, Goswami A (2015) An effective association rule mining scheme using a new generic basis. Knowl Inf Syst 43(1):127–156
Article Google Scholar
Sun J, Xun Y, Zhang J, Li J (2019) Incremental frequent itemsets mining with FCFP tree. IEEE Access 7:136511–136524
Article Google Scholar
Vo B, Hong TP, Le B (2012) DBV-Miner: a dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst Appl 39(8):7196–7206
Article Google Scholar
Vo B, Le T, Coenen F, Hong T (2016) Mining frequent itemsets using the n-list and subsume concepts. Int J Mach Learn Cyber 7(2):253–265
Article Google Scholar
Vo B, Pham S, Le T, Deng Z (2017) A novel approach for mining maximal frequent patterns. Expert Syst Appl 73:178–186
Article Google Scholar
Wang J, Han J, Pei J (2003) Closet+: Searching for the best strategies for mining frequent closed itemsets. In: Proc of the 9th ACM SIGKDD conference, pp 236–245
Xu Y, Li Y (2007) Generating concise association rules. In: Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM ’07), pp 781–790
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Article Google Scholar
Zaki MJ, Hsiao C (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Article Google Scholar
Zhang C, Tian P, Zhang X, Liao Q, Jiang ZL, Wang X (2019) HashEclat: an efficient frequent itemset algorithm. Int J Mach Learn Cyber 10:3003–3016
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the Editor and the Editor-in-Chief and the anonymous reviewers for their constructive comments, pointing to directions of research and additional experimental work that greatly improved the manuscript. The authors would also like to thank Tuong Le and Bay Vo for providing the source code for NAFCP and Sahoo Jayakrushna for the source code for DCI_PLUS.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Abbes Laghrour University of Khenchela, Khenchela, 40000, Algeria
Makhlouf Ledmi
Department of Computer Science, Chahid Mostefa Ben Boulaid University of Batna 2, Batna, Algeria
Samir Zidat
Department of Computer Science, Ferhat Abbas University of Setif 1, Setif, Algeria
Aboubekeur Hamdi-Cherif

Authors

Makhlouf Ledmi
View author publications
You can also search for this author in PubMed Google Scholar
Samir Zidat
View author publications
You can also search for this author in PubMed Google Scholar
Aboubekeur Hamdi-Cherif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Makhlouf Ledmi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ledmi, M., Zidat, S. & Hamdi-Cherif, A. GrAFCI⁺ A fast generator-based algorithm for mining frequent closed itemsets. Knowl Inf Syst 63, 1873–1908 (2021). https://doi.org/10.1007/s10115-021-01575-3

Download citation

Received: 07 January 2019
Revised: 17 April 2021
Accepted: 26 April 2021
Published: 18 May 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s10115-021-01575-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GrAFCI⁺ A fast generator-based algorithm for mining frequent closed itemsets

Abstract

Access this article

Similar content being viewed by others

Efficient algorithms for deriving complete frequent itemsets from frequent closed itemsets

An Efficient Algorithm for Deriving Frequent Itemsets from Lossless Condensed Representation

A Compact Data Structure Based Technique for Mining Frequent Closed Item Sets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GrAFCI+ A fast generator-based algorithm for mining frequent closed itemsets

Abstract

Access this article

Similar content being viewed by others

Efficient algorithms for deriving complete frequent itemsets from frequent closed itemsets

An Efficient Algorithm for Deriving Frequent Itemsets from Lossless Condensed Representation

A Compact Data Structure Based Technique for Mining Frequent Closed Item Sets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

GrAFCI⁺ A fast generator-based algorithm for mining frequent closed itemsets