Skip to main content
Log in

Mining Condensed Frequent-Pattern Bases

  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Frequent-pattern mining has been studied extensively and has many useful applications. However, frequent-pattern mining often generates too many patterns to be truly efficient or effective. In many applications, it is sufficient to generate and examine frequent patterns with a sufficiently good approximation of the support frequency instead of in full precision. Such a compact but “close-enough” frequent-pattern base is called a condensed frequent-pattern base.

In this paper, we propose and examine several alternatives for the design, representation, and implementation of such condensed frequent-pattern bases. Several algorithms for computing such pattern bases are proposed. Their effectiveness at pattern compression and methods for efficiently computing them are investigated. A systematic performance study is conducted on different kinds of databases, and demonstrates the effectiveness and efficiency of our approach in handling frequent-pattern mining in large databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal R, Aggarwal C, Prasad VVV (2001) A tree projection algorithm for generation of frequent itemsets. J Parallel Distrib Comput 61:350–371

    Article  MATH  Google Scholar 

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc 1994 Int Conf Very Large Data Bases (VLDB’94), Santiago, Chile, pp 487–499

  3. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proc 1995 Int Conf Data Engineering (ICDE’95), Taipei, Taiwan, pp 3–14

  4. Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proc 1998 ACM-SIGMOD Int Conf Management of Data (SIGMOD’98), Seattle, USA, pp 85–93

  5. Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proc 2002 ACM SIGKDD Int Conf Knowledge Discovery in Databases (KDD’02), Edmonton, Canada, pp 436–442

  6. Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cubes. In: Proc 1999 ACM-SIGMOD Int Conf Management of Data (SIGMOD’99), Philadelphia, USA, pp 359–370

  7. Boulicaut JF, Bykowski A, Rigotti C (2000) Approximation of frequency queries by means of free-sets. In: Principles of Data Mining and Knowledge Discovery, pp 75–85

  8. Brin S, Motwani R, Silverstein C (1997) Beyond market basket: generalizing association rules to correlations. In: Proc 1997 ACM-SIGMOD Int Conf Management of Data (SIGMOD’97), Tucson, USA, pp 265–276

  9. Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 443–452

  10. Dong G, Han J, Lam J, Pei J, Wang K (2001) Mining multi-dimensional constrained gradients in data cubes. In: Proc 2001 Int Conf Very Large Data Bases (VLDB’01), Rome, Italy, pp 321–330

  11. Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proc 1999 Int Conf Knowledge Discovery and Data Mining (KDD’99), San Diego, USA, pp 43–52

  12. Gouda K, Zaki MJ (2001) Efficiently mining maximal frequent itemsets. In: ICDM, pp 163–170

  13. Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: Proc 1999 Int Conf Data Engineering (ICDE’99), Sydney, Australia, pp 106–115

  14. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc 2000 ACM-SIGMOD Int Conf Management of Data (SIGMOD’00), Dallas, USA, pp 1–12

  15. Imielinski T, Khachiyan L, Abdulghani A (2002) Cubegrades: generalizing association rules. Data Min Knowl Discovery 6:219–258

    Article  Google Scholar 

  16. Kamber M, Han J, Chiang JY (1997) Metarule-guided mining of multi-dimensional association rules using data cubes. In: Proc 1997 Int Conf Knowledge Discovery and Data Mining (KDD’97), Newport Beach, USA, pp 207–210

  17. Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proc 1998 Int Conf Knowledge Discovery and Data Mining (KDD’98), New York, USA, pp 80–86

  18. Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proc 2001 Int Conf Data Mining (ICDM’01), San Jose, USA, pp 369–376

  19. Lakshmanan LVS, Ng R, Han J, and Pang A (1999) Optimization of constrained frequent set queries with 2-variable constraints. In: Proc 1999 ACM-SIGMOD Int Conf Management of Data (SIGMOD’99), Philadelphia, USA, pp 157–168

  20. Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proc 1997 Int Conf Data Engineering (ICDE’97), Birmingham, England, pp 220–231,

  21. Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations (extended abstract). In: Knowledge Discovery and Data Mining, pp 189–194

  22. Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discovery 1:259–289

    Article  Google Scholar 

  23. Margaritis D, Faloutsos C, Thrun S (2001) Netcube: a scalable tool for fast data mining and compression. In: Proc 2001 Int Conf Very Large Data Bases (VLDB’01), pp 311–320

  24. Ng R, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proc 1998 ACM-SIGMOD Int Conf Management of Data (SIGMOD’98), Seattle, USA, pp 13–24

  25. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proc 7th Int Conf Database Theory (ICDT’99), Jerusalem, Israel, pp 398–416

  26. Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 433–332

  27. Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proc 2000 ACM-SIGMOD Int Workshop Data Mining and Knowledge Discovery (DMKD’00), Dallas, USA, pp 11–20

  28. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 215–224

  29. Silverstein C, Brin S, Motwani R, Ullman J (1998) Scalable techniques for mining causal structures. In: Proc 1998 Int Conf Very Large Data Bases (VLDB’98), New York, USA, pp 594–605

  30. Zaki M (2000) Generating non-redundant association rules. In: Proc 2000 ACM SIGKDD Int Conf Knowledge Discovery in Databases (KDD’00), Boston, USA, pp 34–43

  31. Zaki MJ, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proc 2002 SIAM Int Conf Data Mining, Arlington, USA, pp 457–473

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Pei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pei, J., Dong, G., Zou, W. et al. Mining Condensed Frequent-Pattern Bases. Know. Inf. Sys. 6, 570–594 (2004). https://doi.org/10.1007/s10115-003-0133-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-003-0133-6

Keywords

Navigation