skip to main content
article

Frequent closed itemset based algorithms: a thorough structural and analytical survey

Authors Info & Claims
Published:01 June 2006Publication History
Skip Abstract Section

Abstract

As a side effect of the digitalization of unprecedented amount of data, traditional retrieval tools proved to be unable to extract hidden and valuable knowledge. Data Mining, with a clear promise to provide adequate tools and/or techniques to do so, is the discovery of hidden information that can be retrieved from datasets. In this paper, we present a structural and analytical survey of <u>f</u>requent <u>c</u>losed <u>i</u>temset (FCI) based algorithms for mining association rules. Indeed, we provide a structural classification, in four categories, and a comparison of these algorithms based on criteria that we introduce. We also present an analytical comparison of FCI-based algorithms using benchmark dense and sparse datasets as well as "worst case" datasets. Aiming to stand beyond classical performance analysis, we intend to provide a focal point on performance analysis based on memory consumption and advantages and/or limitations of optimization strategies, used in the FCI-based algorithms.

References

  1. C. C. Aggarwal. Towards long pattern generation in dense databases. In ACM-SIGKDD Explorations, volume 3(1), pages 20--26, July 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, pages 478--499, June 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st International Conference on Computational Logic (DOOD 2000), Springer-Verlag, LNAI, volume 1861, London, UK, pages 972--986, July 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal. Mining frequent patterns with counting inference. In Proceeding of the 6th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, Massachusetts, USA, volume 2(2), pages 66--75, 20-23 August 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Ben Yahia, Y. Slimani, and J. Rezgui. A divide and conquer approach for deriving partially ordered sub-structures. In Proceedings of the International 9th Pacific-Asia Conference on Knowledge Data Discovery (PAKDD 2005), LNAI, volume 3518, Springer-Verlag, Hanoi, Vietnam, pages 91--96, 18-20 May 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Free-sets: A condensed representation of boolean data for the approximation of frequency queries. In Jounal of Data Mining and Knowledge Discovery (DMKD), 7 (1):5--22, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In T. Elomaa, H. Mannila, and H. Toivonen, editors, Proceedings of the 6th European Conference on Principles of Knowledge Discovery and Data Mining (PKDD 2002), LNCS, volume 2431, Springer-Verlag, Helsinki, Finland, pages 74--85, 19-23 August 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Calders, C. Rigotti, and J.-F. Boulicaut. A survey on condensed representations for frequent sets. In Constraint Based Mining, Springer-Verlag, LNAI, volume 3848, pages 64--80, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Casali, R. Cicchetti, and L. Lakhal. Essential patterns: A perfect cover of frequent patterns. In A Min Tjoa and J. Trujillo, editors, Proceedings of the 7th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2005), Springer-Verlag, LNCS, volume 3589, Copenhagen, Denmark, pages 428--437, 22-26 August 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Cong, K. H. Tung, X. Xu, F. Pan, and J. Yang. FARMER: finding interesting rule groups in microarray datasets. In Proceedings of the 2004 ACM SIGMOD International conference on Management of data, Paris, France, pages 143--154, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. El-Hajj and O. Zaiane. Finding all frequent patterns starting from the closure. In the International Conference on Advanced Data Mining and Applications, Wuhan, China, pages 67--74, July 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Flouvat, F. De Marchi, and J-M. Petit. A thorough experimental study of datasets for frequent itemsets. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), New Orleans, USA, pages 162--169, November 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Fu and E. Mephu Nguifo. Partitioning large data to scale up lattice-based algorithm. In Proceedings of IEEE International Conference on Tools with Artificial Intelligence (IC-TAI 2003), Sacramento, California, USA, pages 537--541, November 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Galambos and I. Simonelli. Bonferroni-type inequalities with applications. Springer-Verlag, 2000.]]Google ScholarGoogle Scholar
  15. B. Ganter and R. Wille. Formal Concept Analysis. Springer-Verlag, 1999.]]Google ScholarGoogle ScholarCross RefCross Ref
  16. B. Goethals and M. J. Zaki. FIMI'03: Workshop on frequent itemset mining implementations. In B. Goethals and M. J. Zaki, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2003), volume 90 of CEUR Workshop Proceedings, Melbourne, Florida, USA, 19 November 2003.]]Google ScholarGoogle Scholar
  17. G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In B. Goethals and M. J. Zaki, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2003), volume 90 of CEUR Workshop Proceedings, Melbourne, Florida, USA, 19 November 2003.]]Google ScholarGoogle Scholar
  18. T. Hamrouni, S. BenYahia, and Y. Slimani. Avoiding the itemset closure computation "pitfall". In Proceedings of the 3rd International Conference on Concept Lattices and their Applications (CLA 2005), Olomouc, Czech Republic, pages 46--59, 7-9 September 2005.]]Google ScholarGoogle Scholar
  19. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (SIG-MOD'00), Dallas, Texas, USA, pages 1--12, May 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Hipp, U. Gntzer, and G. Nakhaeizadeh. Algorithms for association rule mining - a general survey and comparison. In ACM-SIGKDD Explorations, volume 2(1), pages 58--64, July 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Kryszkiewicz. Concise representation of frequent patterns based on disjunction-free generators. In Proceedings of the 1st IEEE International Conference on Data Mining (ICDM 2001), San Jose, California, USA, pages 305--312, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. O. Kuznetsov and S. A. Ob"edkov. Comparing performance of algorithms for generating concept lattices. In Journal of Experimental and Theoretical Artificial Intelligence (JETAI), volume 14(2-3), pages 189--216, 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  23. L. Lhote, F. Rioult, and A. Soulet. Average number of frequent and closed pattern in random databases. In Proceedings of the 7th Conference Francophone d'Apprentissage Automatique (CAp 2005), Presses Universitaires de Grenoble, Nice, France, pages 345--360, 30 May - 03 June 2005.]]Google ScholarGoogle Scholar
  24. C. Lucchese, S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. kDCI: A multi-strategy algorithm for mining frequent sets. In B. Goethals and M. J. Zaki, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2003), volume 90 of CEUR Workshop Proceedings, Melbourne, Florida, USA, 19 November 2003.]]Google ScholarGoogle Scholar
  25. C. Lucchese, S. Orlando, and R. Perego. Fast and memory efficient mining of frequent closed itemsets. In IEEE Journal Transactions on Knowledge and Data Engineering (TKDE), 18 (1):21--36, January 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Lucchesse, S. Orlando, and R. Perego. DCI-CLOSED: A fast and memory efficient algorithm to mine frequent closed itemsets. In B. Goethals, M. J. Zaki, and R. Bayardo, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004), volume 126 of CEUR Workshop Proceedings, Brighton, UK, 1 November 2004.]]Google ScholarGoogle Scholar
  27. E. Mephu Nguifo. Gallois lattice: A framework for concept learning, design, evaluation and refinement. In Proceedings of IEEE International Conference on Tools with Artificial Intelligence (ICTAI 1994), New-Orleans, USA, pages 461--467, November 1994.]]Google ScholarGoogle Scholar
  28. F. Pan, G. Cong, K. H. Tung, J. Yang, and M. J. Zaki. CARPENTER: finding closed patterns in long biological datasets. In Proceedings of the 9th ACM SIGKDD International conference on Knowledge discovery and data mining, pages 637--642, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Pasquier. Datamining: Algorithmes d'extraction et de réduction des règles d'association dans les bases de données. Thèse de doctorat, École Doctorale Sciences pour l'Ingénieur de Clermont Ferrand, Université Clermont Ferrand II, France, Janvier 2000.]]Google ScholarGoogle Scholar
  30. N. Pasquier. Mining association rules using formal concept analysis. In Proceedings of the 8th International Conference on Conceptual Structures (ICCS 2000), Springer-Verlag, LNAI, volume 1867, Darmstadt, Germany, pages 259--264, August 2000.]]Google ScholarGoogle Scholar
  31. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efficient mining of association rules using closed itemset lattices. Journal of Information Systems, 24(1):25--46, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Pasquier, Y. Bastide, R. Touil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In C. Beeri and P. Buneman, editors, Proceedings of 7th International Conference on Database Theory (ICDT 1999), LNCS, volume 1540, Springer-Verlag, Jerusalem, Israel, pages 398--416, January 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of the ACM-SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD 2000), Dallas, Texas, USA, pages 21--30, 2000.]]Google ScholarGoogle Scholar
  34. R. Srikant. Fast algorithms for mining association rules and sequential patterns. Ph. D dissertation, University of Wisconsin, Madison, USA, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Stumme, R. Taouil, Y. Bastide, N. Pasquier, and L. Lakhal. Computing iceberg concept lattices with TITANIC. Journal on Knowledge and Data Engineering (KDE), 2(42):189--222, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Uno, T. Asai, Y. Uchida, and H. Arimura. An efficient algorithm for enumerating closed patterns in transaction databases. Proceedings of the 7th International Conference on Discovery Science, Padova, Italy, pages 16--31, 2-5 October 2004.]]Google ScholarGoogle ScholarCross RefCross Ref
  37. T. Uno, M. Kiyomi, and H. Arimura. LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In B. Goethals, M. J. Zaki, and R. Bayardo, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004), volume 126 of CEUR Workshop Proceedings, Brighton, UK, 1 November 2004.]]Google ScholarGoogle Scholar
  38. P. Valtchev, R. Missaoui, and P. Lebrun. A fast algorithm for building the Hasse diagram of a Galois lattice. In Proceedings of the Colloque LaCIM 2000, Montréal, Canada, pages 293--306, September 2000.]]Google ScholarGoogle Scholar
  39. J. Wang, J. Han, and J. Pei. CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In Proceedings of the 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), Washington D. C., USA, pages 236--245, 24-27 August 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Wille. Restructuring lattices theory: An approach based on hierarchies of concepts. I. Rival, editor, Ordered Sets, Reidel, Dordrecht-Boston, p. 445--470, 1982.]]Google ScholarGoogle Scholar
  41. M. J. Zaki. Parallel and distributed association mining: A survey. In IEEE Journal Concurrency, 7 (4):14--25, October 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. J. Zaki and C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining, Arlington, Virginia, USA, pages 34--43, April 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  43. Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms (long version). Available at http://ai.stanford.edu/users/ronnyk/realworldassoclong-paper.pdf. Accessed on February 15th 2006.]]Google ScholarGoogle Scholar
  44. Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In F. Provost and R. Srikant, editors, Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, pages 401--406, August 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Zhu. Efficiently mining frequent itemsets from very large databases. Ph. D thesis, University of Concordia, Montréal, Québec, Canada, September 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Frequent closed itemset based algorithms: a thorough structural and analytical survey

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGKDD Explorations Newsletter
              ACM SIGKDD Explorations Newsletter  Volume 8, Issue 1
              June 2006
              104 pages
              ISSN:1931-0145
              EISSN:1931-0153
              DOI:10.1145/1147234
              Issue’s Table of Contents

              Copyright © 2006 Authors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 June 2006

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader