Skip to main content
Log in

GrAFCI+ A fast generator-based algorithm for mining frequent closed itemsets

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Mining itemsets for association rule generation is a fundamental data mining task originally stemming from the traditional market basket analysis problem. However, enumerating all frequent itemsets, especially in a dense dataset, or with low support thresholds, remains costly. In this paper, a novel theorem builds the relationship between frequent closed itemsets (FCIs) and frequent generator itemsets (FGIs) and proves that the process of mining FCIs is equivalent to mining FGIs, unified with their full-support and extension items. On the basis of this theorem, a generator-based algorithm for mining FCIs, called GrAFCI+, is proposed and explained in details including its correctness. The comparative effectiveness of the algorithm in terms of scalability is first investigated, along with the compression rate—a measure of the interestingness of a given FIs representation. Extensive experiments are further undertaken on eight datasets and four state-of-the-art algorithms, namely DCI_CLOSED*, DCI_PLUS, FPClose, and NAFCP. The results show that the proposed algorithm is more efficient regarding the execution time in most cases as compared to these algorithms. Because GrAFCI+ main goal is to address the runtime issue, it paid a memory cost, especially when the support is too small. However, this cost is not high since GrAFCI+ is seconded by only one competitor out of four in memory utilization and for large support values. As an overall assessment, GrAFCI+ gives better results than most of its competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://www.philippe-fournier-viger.com/spmf/index.php?link=algorithms.php

  2. https://www.comp.nus.edu.sg/~wongls/projects/pattern-spaces/

  3. https://github.com/rionda/truefis/tree/master/code/grahne

  4. The actual url is http://fimi.uantwerpen.be/data/

  5. Precise calculations can readily be done using formula 9 and data from Table 8. For example, for Accidents dataset, with support \(20\%\), \(CR(FC) = (887 441)/(889 936) = 99.71\%\), which is an extremely poor compression, meaning that we need \(99.7\%\) of FIs to represent \(100\%\) of FIs. For the Connect dataset, and \(30\%\) support, \(CR(FC) = (460 412)/(1 331 880 801) = 0.3\%\), which is an excellent compression rate, meaning that only \(0.3\%\) of FIs are needed in memory to represent \(100\%\) of FIs.

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on very large data bases (VLDB ’94), Morgan Kaufmann Publishers Inc., pp 487–499

  2. Alves R, Rodríguez-Baena DS, Aguilar-Ruiz JS (2010) Gene association analysis: a survey of frequent pattern mining from gene expression data. Briefings Bioinform 11(2):210–224

    Article  Google Scholar 

  3. Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) MAFIA: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504

    Article  Google Scholar 

  4. Deng Z, Lv S (2015) Prepost+: an efficient n-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Syst Appl 42(13):5424–5432

    Article  Google Scholar 

  5. Deng Z, Wang Z (2010) A new fast vertical method for mining frequent patterns. Int J Comput Intell Syst 3(6):733–744

    Article  Google Scholar 

  6. Deng Z, Wang Z, Jiang J (2012) A new algorithm for fast mining frequent itemsets using n-lists. Sci China Inform Sci 55(9):2008–2030

    Article  MathSciNet  Google Scholar 

  7. Djenouri Y, Djenouri D, Belhadi A, Fournier-Viger P, Lin JCW (2018) A new framework for metaheuristic-based frequent itemset mining. Appl Intell 48(12):4775–4791

    Article  Google Scholar 

  8. Dong G, Feng M, Son NT, Lee TS, Li J, Liu G, Wong L (2002) pattern space projects. https://www.comp.nus.edu.sg/~wongls/projects/pattern-spaces/

  9. FIMI (2003) Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/

  10. Fournier-Viger P, Lin JC, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: LNCS, vol 9853, pp 36–40

  11. Goethals B, Zaki MJ (2004) Advances in frequent itemset mining implementations: Report on FIMI03. SIGKDD Explorat 6(1):109–117

    Article  Google Scholar 

  12. Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362

    Article  Google Scholar 

  13. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29(2):1–12

    Article  Google Scholar 

  14. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Disc 8(1):53–87

    Article  MathSciNet  Google Scholar 

  15. Han J, Kamber M, Pe J (2011) Data Mining: Concepts and Techniques, chap 6, 3rd edn. Morgan Kaufmann Publishers, Burlington, pp 243–278

    Google Scholar 

  16. Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: Proceedings 2001 IEEE international conference on data mining, pp 305–312

  17. Le T, Vo B (2015) An n-list-based algorithm for mining frequent closed patterns. Expert Syst Appl 42(19):6648–6657

    Article  Google Scholar 

  18. Li J, Li H, Wong L, Pei J, Dong G (2006) Minimum description length principle: generators are preferable to closed patterns. In: Proceedings of the 21st national conference on artificial intelligence - Volume 1, AAAI Press, pp 409–414

  19. Liu G, Li J, Wong L (2008) A new concise representation of frequent itemsets using generators and a positive border. Knowl Inf Syst 17(1):35–56

    Article  Google Scholar 

  20. Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36

    Article  Google Scholar 

  21. Nam H, Yun U, Yoon E, Lin JCW (2020) Efficient approach for incremental weighted erasable pattern mining with list structure. Expert Syst Appl 143:113087

    Article  Google Scholar 

  22. Pan F, Cong G, Tung AKH, Yang J, Zaki MJ (2003) Carpenter: Finding closed patterns in long biological datasets. In: Proceedings of the 9th ACM SIGKDD conference, pp 637—-642

  23. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46

    Article  Google Scholar 

  24. Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: workshop on research issues in data mining and knowledge discovery, pp 21–30

  25. Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6(5):570–594

    Article  Google Scholar 

  26. Sahoo J, Ashok KD, Goswami A (2015) An effective association rule mining scheme using a new generic basis. Knowl Inf Syst 43(1):127–156

    Article  Google Scholar 

  27. Sun J, Xun Y, Zhang J, Li J (2019) Incremental frequent itemsets mining with FCFP tree. IEEE Access 7:136511–136524

    Article  Google Scholar 

  28. Vo B, Hong TP, Le B (2012) DBV-Miner: a dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst Appl 39(8):7196–7206

    Article  Google Scholar 

  29. Vo B, Le T, Coenen F, Hong T (2016) Mining frequent itemsets using the n-list and subsume concepts. Int J Mach Learn Cyber 7(2):253–265

    Article  Google Scholar 

  30. Vo B, Pham S, Le T, Deng Z (2017) A novel approach for mining maximal frequent patterns. Expert Syst Appl 73:178–186

    Article  Google Scholar 

  31. Wang J, Han J, Pei J (2003) Closet+: Searching for the best strategies for mining frequent closed itemsets. In: Proc of the 9th ACM SIGKDD conference, pp 236–245

  32. Xu Y, Li Y (2007) Generating concise association rules. In: Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM ’07), pp 781–790

  33. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390

    Article  Google Scholar 

  34. Zaki MJ, Hsiao C (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478

    Article  Google Scholar 

  35. Zhang C, Tian P, Zhang X, Liao Q, Jiang ZL, Wang X (2019) HashEclat: an efficient frequent itemset algorithm. Int J Mach Learn Cyber 10:3003–3016

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Editor and the Editor-in-Chief and the anonymous reviewers for their constructive comments, pointing to directions of research and additional experimental work that greatly improved the manuscript. The authors would also like to thank Tuong Le and Bay Vo for providing the source code for NAFCP and Sahoo Jayakrushna for the source code for DCI_PLUS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Makhlouf Ledmi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ledmi, M., Zidat, S. & Hamdi-Cherif, A. GrAFCI+ A fast generator-based algorithm for mining frequent closed itemsets. Knowl Inf Syst 63, 1873–1908 (2021). https://doi.org/10.1007/s10115-021-01575-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-021-01575-3

Keywords

Navigation