Skip to main content
Log in

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified.

Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four.

In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classification-based algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hipp J, Myka A, Wirth R, Güntzer U. A new algorithm for faster mining of generalized association rules. In Proc. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Nantes, France, 1998, pp.74–82.

  2. Pramudiono I, Kitsuregawa M. FP-tax: Tree structure based generalized association rule mining. In Proc. ACM/SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), Paris, France, 2004, pp.60–63.

  3. Srikant R, Agrawal R. Mining generalized association rules. In Proc. International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, 1995, pp.407–419.

  4. Sriphaew K, Theeramunkong T. A new method for finding generalized frequent itemsets in generalized association rule mining. In Proc. International Symposium on Computers and Communications (ISCC), Taormina, Italy, 2002, pp.1040–1045.

  5. Sriphaew K, Theeramunkong T. Fast algorithms for mining generalized frequent patterns of generalized association rules. IEICE Transactions on Information and Systems, March 2004, E87-D(3).

  6. Sriphaew K, Theeramunkong T. Mining generalized closed frequent itemsets of generalized association rules. In Proc. International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES), Oxford, United Kingdom, 2003, pp.476–484.

  7. Bayardo Jr R J. Efficiently mining long patterns from databases. In Proc. ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Seattle, WA, 1998, pp.85–93.

  8. Agarwal R C, Aggarwal C C, Prasad V V V. A tree projection algorithm for generation of frequent item sets. Journal of Parallel Distributed Computing, 2001, 61(3): 350–371.

    Article  MATH  Google Scholar 

  9. Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Proceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Dallas, TX, 2000, pp.1–12.

  10. Lin D I, Kedem Z M. Pincer-Search: An efficient algorithm for discovering the maximum frequent set. IEEE Trans. Knowledge and Data Engineering (TKDE), 2002, 14(3): 553–566.

    Article  Google Scholar 

  11. Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In Proc. International Conference on Database Theory (ICDT), Jerusalem, Israel, 1999, pp.398–416.

  12. Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. ACM/SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), Dallas, TX, 2000, pp.21–30.

  13. Wang K, Tang L, Han J, Liu J. Top down FP-growth for association rule mining. In Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Taipei, Taiwan, 2002, pp.334–340.

  14. Agrawal R, Imielinski T, Swami A M. Mining association rules between sets of items in large databases. In Proc. ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Washington DC, 1993, pp.207–216.

  15. Agarwal R C, Aggarwal C C, Prasad V V V. Depth first generation of long patterns. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, 2000, pp.108–118.

  16. Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. International Conference on Data Engineering (ICDE), Heidelberg, Germany, 2001, pp.443–452.

  17. Gouda K, Zaki M J. Efficiently mining maximal frequent itemsets. In Proc. International Conference on Data Mining (ICDM), San Jose, CA, 2001, pp.163–170.

  18. Xin D, Han J, Yan X, Cheng H. Mining compressed frequent-pattern sets. In Proc. International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, 2005, pp.709–720.

  19. Yan X, Cheng H, Han J, Xin D. Summarizing itemset patterns: A profile-based approach. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Chicago, IL, 2005, pp.314–323.

  20. Calders T, Goethals B. Depth-first non-derivable itemset mining. In Proc. the SIAM International Conference on Data Mining (SDM), Newport Beach, CA, 2005.

  21. Ke Y, Cheng J, Ng W. Mining quantitative correlated patterns using an information-theoretic approach. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Philadelphia, PA, 2006, pp.227–236.

  22. Xiong H, Tan P N, Kumar V. Hyperclique pattern discovery. Data Mining and Knowledge Discovery, 2006, 13(2): 219–242.

    Article  MathSciNet  Google Scholar 

  23. Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen Y K, Dubey P. Cache-conscious frequent pattern mining on a modern processor. In Proc. International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, 2005, pp.577–588.

  24. Han J, Fu Y. Mining multiple-level association rules in large databases. IEEE Trans. Knowledge and Data Engineering (TKDE), 1999, 11(5): 798–805.

    Article  Google Scholar 

  25. Huang Y F, Wu C M. Mining generalized association rules using pruning techniques. In Proc. International Conference on Data Mining (ICDM), Maebashi City, Japan, 2002, pp.227–234.

  26. Aggarwal C C, Yu P S. Online generation of association rules. In Proc. International Conference on Data Engineering (ICDE), Orlando, FL, 1998, pp.402–411.

  27. Zaki M J. Generating non-redundant association rules. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, 2000, pp.34–43.

  28. Lui C L, Chung K F. Discovery of generalized association rules with multiple minimum supports. In Proc. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Lyon, France, 2000, pp.510–515.

  29. Tseng M C, Lin W Y. Mining generalized association rules with multiple minimum supports. In Proc. International Conference on Data Warehousing and Knowledge Discovery (DaWaK), Munich, Germany, 2001, pp.11–20.

  30. Newman D J, Asuncion A. UCI machine learning repository. University of California, Irvine, 2007, http:mlearn.ics.uci.edu/MLRepository.html.

  31. Synthetic Data Generation Code for Associations and Sequential Patterns (IBM Almaden Research Center). http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html.

  32. Kunkle D, Zhang D, Cooperman G. Efficient mining of max frequent patterns in a generalized environment. In Proc. International Conference on Information and Knowledge Management (CIKM), Arlington, VA, 2006, pp.810–811.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Kunkle.

Additional information

A shorter version of this work appeared in CIKM’06 as a two-page poster [32].

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 67 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kunkle, D., Zhang, D. & Cooperman, G. Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy. J. Comput. Sci. Technol. 23, 77–102 (2008). https://doi.org/10.1007/s11390-008-9107-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-008-9107-1

Keywords

Navigation