Abstract
As a side effect of the digitalization of unprecedented amount of data, traditional retrieval tools proved to be unable to extract hidden and valuable knowledge. Data Mining, with a clear promise to provide adequate tools and/or techniques to do so, is the discovery of hidden information that can be retrieved from datasets. In this paper, we present a structural and analytical survey of <u>f</u>requent <u>c</u>losed <u>i</u>temset (FCI) based algorithms for mining association rules. Indeed, we provide a structural classification, in four categories, and a comparison of these algorithms based on criteria that we introduce. We also present an analytical comparison of FCI-based algorithms using benchmark dense and sparse datasets as well as "worst case" datasets. Aiming to stand beyond classical performance analysis, we intend to provide a focal point on performance analysis based on memory consumption and advantages and/or limitations of optimization strategies, used in the FCI-based algorithms.
- C. C. Aggarwal. Towards long pattern generation in dense databases. In ACM-SIGKDD Explorations, volume 3(1), pages 20--26, July 2001.]] Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, pages 478--499, June 1994.]] Google ScholarDigital Library
- Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st International Conference on Computational Logic (DOOD 2000), Springer-Verlag, LNAI, volume 1861, London, UK, pages 972--986, July 2000.]] Google ScholarDigital Library
- Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal. Mining frequent patterns with counting inference. In Proceeding of the 6th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, Massachusetts, USA, volume 2(2), pages 66--75, 20-23 August 2000.]] Google ScholarDigital Library
- S. Ben Yahia, Y. Slimani, and J. Rezgui. A divide and conquer approach for deriving partially ordered sub-structures. In Proceedings of the International 9th Pacific-Asia Conference on Knowledge Data Discovery (PAKDD 2005), LNAI, volume 3518, Springer-Verlag, Hanoi, Vietnam, pages 91--96, 18-20 May 2005.]] Google ScholarDigital Library
- J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Free-sets: A condensed representation of boolean data for the approximation of frequency queries. In Jounal of Data Mining and Knowledge Discovery (DMKD), 7 (1):5--22, 2003.]] Google ScholarDigital Library
- T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In T. Elomaa, H. Mannila, and H. Toivonen, editors, Proceedings of the 6th European Conference on Principles of Knowledge Discovery and Data Mining (PKDD 2002), LNCS, volume 2431, Springer-Verlag, Helsinki, Finland, pages 74--85, 19-23 August 2002.]] Google ScholarDigital Library
- T. Calders, C. Rigotti, and J.-F. Boulicaut. A survey on condensed representations for frequent sets. In Constraint Based Mining, Springer-Verlag, LNAI, volume 3848, pages 64--80, 2006.]] Google ScholarDigital Library
- A. Casali, R. Cicchetti, and L. Lakhal. Essential patterns: A perfect cover of frequent patterns. In A Min Tjoa and J. Trujillo, editors, Proceedings of the 7th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2005), Springer-Verlag, LNCS, volume 3589, Copenhagen, Denmark, pages 428--437, 22-26 August 2005.]] Google ScholarDigital Library
- G. Cong, K. H. Tung, X. Xu, F. Pan, and J. Yang. FARMER: finding interesting rule groups in microarray datasets. In Proceedings of the 2004 ACM SIGMOD International conference on Management of data, Paris, France, pages 143--154, 2004.]] Google ScholarDigital Library
- M. El-Hajj and O. Zaiane. Finding all frequent patterns starting from the closure. In the International Conference on Advanced Data Mining and Applications, Wuhan, China, pages 67--74, July 2005.]] Google ScholarDigital Library
- F. Flouvat, F. De Marchi, and J-M. Petit. A thorough experimental study of datasets for frequent itemsets. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), New Orleans, USA, pages 162--169, November 2005.]] Google ScholarDigital Library
- H. Fu and E. Mephu Nguifo. Partitioning large data to scale up lattice-based algorithm. In Proceedings of IEEE International Conference on Tools with Artificial Intelligence (IC-TAI 2003), Sacramento, California, USA, pages 537--541, November 2003.]] Google ScholarDigital Library
- J. Galambos and I. Simonelli. Bonferroni-type inequalities with applications. Springer-Verlag, 2000.]]Google Scholar
- B. Ganter and R. Wille. Formal Concept Analysis. Springer-Verlag, 1999.]]Google ScholarCross Ref
- B. Goethals and M. J. Zaki. FIMI'03: Workshop on frequent itemset mining implementations. In B. Goethals and M. J. Zaki, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2003), volume 90 of CEUR Workshop Proceedings, Melbourne, Florida, USA, 19 November 2003.]]Google Scholar
- G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In B. Goethals and M. J. Zaki, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2003), volume 90 of CEUR Workshop Proceedings, Melbourne, Florida, USA, 19 November 2003.]]Google Scholar
- T. Hamrouni, S. BenYahia, and Y. Slimani. Avoiding the itemset closure computation "pitfall". In Proceedings of the 3rd International Conference on Concept Lattices and their Applications (CLA 2005), Olomouc, Czech Republic, pages 46--59, 7-9 September 2005.]]Google Scholar
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (SIG-MOD'00), Dallas, Texas, USA, pages 1--12, May 2000.]] Google ScholarDigital Library
- J. Hipp, U. Gntzer, and G. Nakhaeizadeh. Algorithms for association rule mining - a general survey and comparison. In ACM-SIGKDD Explorations, volume 2(1), pages 58--64, July 2000.]] Google ScholarDigital Library
- M. Kryszkiewicz. Concise representation of frequent patterns based on disjunction-free generators. In Proceedings of the 1st IEEE International Conference on Data Mining (ICDM 2001), San Jose, California, USA, pages 305--312, 2001.]] Google ScholarDigital Library
- S. O. Kuznetsov and S. A. Ob"edkov. Comparing performance of algorithms for generating concept lattices. In Journal of Experimental and Theoretical Artificial Intelligence (JETAI), volume 14(2-3), pages 189--216, 2002.]]Google ScholarCross Ref
- L. Lhote, F. Rioult, and A. Soulet. Average number of frequent and closed pattern in random databases. In Proceedings of the 7th Conference Francophone d'Apprentissage Automatique (CAp 2005), Presses Universitaires de Grenoble, Nice, France, pages 345--360, 30 May - 03 June 2005.]]Google Scholar
- C. Lucchese, S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. kDCI: A multi-strategy algorithm for mining frequent sets. In B. Goethals and M. J. Zaki, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2003), volume 90 of CEUR Workshop Proceedings, Melbourne, Florida, USA, 19 November 2003.]]Google Scholar
- C. Lucchese, S. Orlando, and R. Perego. Fast and memory efficient mining of frequent closed itemsets. In IEEE Journal Transactions on Knowledge and Data Engineering (TKDE), 18 (1):21--36, January 2006.]] Google ScholarDigital Library
- C. Lucchesse, S. Orlando, and R. Perego. DCI-CLOSED: A fast and memory efficient algorithm to mine frequent closed itemsets. In B. Goethals, M. J. Zaki, and R. Bayardo, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004), volume 126 of CEUR Workshop Proceedings, Brighton, UK, 1 November 2004.]]Google Scholar
- E. Mephu Nguifo. Gallois lattice: A framework for concept learning, design, evaluation and refinement. In Proceedings of IEEE International Conference on Tools with Artificial Intelligence (ICTAI 1994), New-Orleans, USA, pages 461--467, November 1994.]]Google Scholar
- F. Pan, G. Cong, K. H. Tung, J. Yang, and M. J. Zaki. CARPENTER: finding closed patterns in long biological datasets. In Proceedings of the 9th ACM SIGKDD International conference on Knowledge discovery and data mining, pages 637--642, 2003.]] Google ScholarDigital Library
- N. Pasquier. Datamining: Algorithmes d'extraction et de réduction des règles d'association dans les bases de données. Thèse de doctorat, École Doctorale Sciences pour l'Ingénieur de Clermont Ferrand, Université Clermont Ferrand II, France, Janvier 2000.]]Google Scholar
- N. Pasquier. Mining association rules using formal concept analysis. In Proceedings of the 8th International Conference on Conceptual Structures (ICCS 2000), Springer-Verlag, LNAI, volume 1867, Darmstadt, Germany, pages 259--264, August 2000.]]Google Scholar
- N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efficient mining of association rules using closed itemset lattices. Journal of Information Systems, 24(1):25--46, 1999.]] Google ScholarDigital Library
- N. Pasquier, Y. Bastide, R. Touil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In C. Beeri and P. Buneman, editors, Proceedings of 7th International Conference on Database Theory (ICDT 1999), LNCS, volume 1540, Springer-Verlag, Jerusalem, Israel, pages 398--416, January 1999.]] Google ScholarDigital Library
- J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of the ACM-SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD 2000), Dallas, Texas, USA, pages 21--30, 2000.]]Google Scholar
- R. Srikant. Fast algorithms for mining association rules and sequential patterns. Ph. D dissertation, University of Wisconsin, Madison, USA, 1996.]] Google ScholarDigital Library
- G. Stumme, R. Taouil, Y. Bastide, N. Pasquier, and L. Lakhal. Computing iceberg concept lattices with TITANIC. Journal on Knowledge and Data Engineering (KDE), 2(42):189--222, 2002.]] Google ScholarDigital Library
- T. Uno, T. Asai, Y. Uchida, and H. Arimura. An efficient algorithm for enumerating closed patterns in transaction databases. Proceedings of the 7th International Conference on Discovery Science, Padova, Italy, pages 16--31, 2-5 October 2004.]]Google ScholarCross Ref
- T. Uno, M. Kiyomi, and H. Arimura. LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In B. Goethals, M. J. Zaki, and R. Bayardo, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004), volume 126 of CEUR Workshop Proceedings, Brighton, UK, 1 November 2004.]]Google Scholar
- P. Valtchev, R. Missaoui, and P. Lebrun. A fast algorithm for building the Hasse diagram of a Galois lattice. In Proceedings of the Colloque LaCIM 2000, Montréal, Canada, pages 293--306, September 2000.]]Google Scholar
- J. Wang, J. Han, and J. Pei. CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In Proceedings of the 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), Washington D. C., USA, pages 236--245, 24-27 August 2003.]] Google ScholarDigital Library
- R. Wille. Restructuring lattices theory: An approach based on hierarchies of concepts. I. Rival, editor, Ordered Sets, Reidel, Dordrecht-Boston, p. 445--470, 1982.]]Google Scholar
- M. J. Zaki. Parallel and distributed association mining: A survey. In IEEE Journal Concurrency, 7 (4):14--25, October 1999.]] Google ScholarDigital Library
- M. J. Zaki and C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining, Arlington, Virginia, USA, pages 34--43, April 2002.]]Google ScholarCross Ref
- Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms (long version). Available at http://ai.stanford.edu/users/ronnyk/realworldassoclong-paper.pdf. Accessed on February 15th 2006.]]Google Scholar
- Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In F. Provost and R. Srikant, editors, Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, pages 401--406, August 2001.]] Google ScholarDigital Library
- J. Zhu. Efficiently mining frequent itemsets from very large databases. Ph. D thesis, University of Concordia, Montréal, Québec, Canada, September 2004.]] Google ScholarDigital Library
Index Terms
- Frequent closed itemset based algorithms: a thorough structural and analytical survey
Recommendations
Frequent Closed Informative Itemset Mining
CIS '07: Proceedings of the 2007 International Conference on Computational Intelligence and SecurityIn recent years, cluster analysis and association analysis have attracted a lot of attention for large data analysis such as biomedical data analysis. This paper proposes a novel algorithm of frequent closed itemset mining. The algorithm addresses two ...
Fast Algorithms for Frequent Itemset Mining Using FP-Trees
Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a prefix-tree structure, known as an FP-tree, for ...
An Analytical Study on Frequent Itemset Mining Algorithms
MIKE 2013: Proceedings of the First International Conference on Mining Intelligence and Knowledge Exploration - Volume 8284Data mining is the process of collecting, extracting and analyzing large data set from different perspectives. Fundamental and important task of data mining is the mining of frequent itemsets. Frequent itemsets play an important role in association rule ...
Comments