Pattern-Growth Methods

Han, Jiawei; Pei, Jian

doi:10.1007/978-3-319-07821-2_3

Jiawei Han³ &
Jian Pei⁴

5613 Accesses
4 Citations

Abstract

Mining frequent patterns has been a focused topic in data mining research in recent years, with the development of numerous interesting algorithms for mining association, correlation, causality, sequential patterns, partial periodicity, constraint-based frequent pattern mining, associative classification, emerging patterns, etc. Many studies adopt an Apriori-like, candidate generation-and-test approach. However, based on our analysis, candidate generation and test may still be expensive, especially when encountering long and numerous patterns.

A new methodology, called frequent pattern growth, which mines frequent patterns without candidate generation, has been developed. The method adopts a divide-and-conquer philosophy to project and partition databases based on the currently discovered frequent patterns and grow such patterns to longer ones in the projected databases. Moreover, efficient data structures have been developed for effective database compression and fast in-memory traversal. Such a methodology may eliminate or substantially reduce the number of candidate sets to be generated and also reduce the size of the database to be iteratively examined, and, therefore, lead to high performance.

In this paper, we provide an overview of this approach and examine its methodology and implications for mining several kinds of frequent patterns, including association, frequent closed itemsets, max-patterns, sequential patterns, and constraint-based mining of frequent patterns. We show that frequent pattern growth is efficient at mining large data-bases and its further development may lead to scalable mining of many other kinds of patterns as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 487–499, Santiago, Chile, Sept. 1994.
Google Scholar
R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. 1995 Int. Conf. Data Engineering (ICDE'95), pages 3–14, Taipei, Taiwan, Mar. 1995.
Google Scholar
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'93), pages 207–216, Washington, DC, May 1993.
Google Scholar
R. Agarwal, C. C. Aggarwal, and V. V. V. Prasad. Depth-first generation of large itemsets for association rules. In IBM Technical Report RC21538, July 1999.
Google Scholar
R. J. Bayardo. Efficiently mining long patterns from databases. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98), pages 85–93, Seattle, WA, June 1998.
Google Scholar
R. J. Bayardo, R. Agrawal, and D. Gunopulos. Constraint-based rule mining on large, dense data sets. In Proc. 1999 Int. Conf. Data Engineering (ICDE'99), pages 188–197, Sydney, Australia, April 1999.
Google Scholar
K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), pages 359–370, Philadelphia, PA, June 1999.
Google Scholar
S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. In Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'97), pages 265–276, Tucson, AZ, May 1997.
Google Scholar
C. Chen, X. Yan, F. Zhu, and J. Han. gApprox: Mining frequent approximate patterns from a massive network. In Proc. 2007 Int. Conf. Data Mining (ICDM'07), Omaha, NE, Oct. 2007.
Google Scholar
H. Cheng, X. Yan, J. Han, and C.-W. Hsu. Discriminative frequent pattern analysis for effective classification. In Proc. 2007 Int. Conf. Data Engineering (ICDE'07), pages 716–725, Istanbul, Turkey, April 2007.
Google Scholar
H. Cheng, X. Yan, J. Han, and P. S. Yu. Direct discriminative pattern mining for effective classification. In Proc. 2008 Int. Conf. Data Engineering (ICDE'08), Cancun, Mexico, April 2008.
Google Scholar
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining (KDD'99), pages 43–52, San Diego, CA, Aug. 1999.
Google Scholar
M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB'98), pages 299–310, New York, NY, Aug. 1998.
Google Scholar
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Proc. ICDM'03 Int. Workshop on Frequent Itemset Mining Implementations (FIMI'03), Melbourne, FL, Nov. 2003.
Google Scholar
G. Grahne, L.V.S. Lakshmanan, and X. Wang. Efficient mining of constrained correlated sets. In Proc. 2000 Int. Conf. Data Engineering (ICDE'00), pages 512–521, San Diego, CA, Feb. 2000.
Google Scholar
J. Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database. In Proc. 1999 Int. Conf. Data Engineering (ICDE'99), pages 106–115, Sydney, Australia, April 1999.
Google Scholar
J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. 2000 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'00), pages 355–359, Boston, MA, Aug. 2000.
Google Scholar
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'00), pages 1–12, Dallas, TX, May 2000.
Google Scholar
F. Korn, A. Labrinidis, Y. Kotidis, and C. Faloutsos. Ratio rules: A new paradigm for fast, quantifiable data mining. In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB'98), pages 582–593, New York, NY, Aug. 1998.
Google Scholar
L.V.S. Lakshmanan, R. Ng, J. Han, and A. Pang. Optimization of constrained frequent set queries with 2-variable constraints. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), pages 157–168, Philadelphia, PA, June 1999.
Google Scholar
W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. 2001 Int. Conf. Data Mining (ICDM'01), pages 369–376, San Jose, CA, Nov. 2001.
Google Scholar
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98), pages 80–86, New York, NY, Aug. 1998.
Google Scholar
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In Proc. AAAI'94 Workshop on Knowledge Discovery in Databases (KDD'94), pages 181–192, Seattle, WA, July 1994.
Google Scholar
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1:259–289, 1997.
Article Google Scholar
R. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98), pages 13–24, Seattle, WA, June 1998.
Google Scholar
B. Özden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. In Proc. 1998 Int. Conf. Data Engineering (ICDE'98), pages 412–421, Orlando, FL, Feb. 1998.
Google Scholar
J. Pei and J. Han. Can we push more constraints into frequent pattern mining? In Proc. 2000 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'00), pages 350–354, Boston, MA, Aug. 2000.
Google Scholar
J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. 2000 ACM-SIGMOD Int. Workshop on Data Mining and Knowledge Discovery (DMKD'00), pages 11–20, Dallas, TX, May 2000.
Google Scholar
J. Pei, J. Han, and L.V.S. Lakshmanan. Mining frequent itemsets with convertible constraints. In Proc. 2001 Int. Conf. Data Engineering (ICDE'01), pages 433–442, Heidelberg, Germany, April 2001.
Google Scholar
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. 2001 Int. Conf. Data Engineering (ICDE'01), pages 215–224, Heidelberg, Germany, April 2001.
Google Scholar
J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang. H-Mine: Hyper-structure mining of frequent patterns in large databases. In Proc. 2001 Int. Conf. Data Mining (ICDM'01), pages 441–448, San Jose, CA, Nov. 2001.
Google Scholar
J. Pei, X. Zhang, M. Cho, H. Wang, and P. S. Yu. Maple: A fast algorithm for maximal pattern-based clustering. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM'03), Melbourne, Florida, Nov. 2003. IEEE.
Google Scholar
H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, and U. Dayal. Multi-dimensional sequential pattern mining. In Proc. 2001 Int. Conf. Information and Knowledge Management (CIKM'01), pages 81–88, Atlanta, GA, Nov. 2001.
Google Scholar
C. Silverstein, S. Brin, R. Motwani, and J. D. Ullman. Scalable techniques for mining causal structures. In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB'98), pages 594–605, New York, NY, Aug. 1998.
Google Scholar
R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. Extending Database Technology (EDBT'96), pages 3–17, Avignon, France, Mar. 1996.
Google Scholar
R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD'97), pages 67–73, Newport Beach, CA, Aug. 1997.
Google Scholar
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. 2002 Int. Conf. Data Mining (ICDM'02), pages 721–724, Maebashi, Japan, Dec. 2002.
Google Scholar
X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns. In Proc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 286–295, Washington, DC, Aug. 2003.
Google Scholar
F. Zhu, X. Yan, J. Han, P. S. Yu, and H. Cheng. Mining colossal frequent patterns by core pattern fusion. In Proc. 2007 Int. Conf. Data Engineering (ICDE'07), Istanbul, Turkey, April 2007.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, 61801, Urbana, IL, USA
Jiawei Han
Simon Fraser University, V5A 1S6, Burnaby, BC, Canada
Jian Pei

Authors

Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar
Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiawei Han .

Editor information

Editors and Affiliations

IBM, Yorktown Heights, New York, USA
Charu C. Aggarwal
University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
Jiawei Han

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Han, J., Pei, J. (2014). Pattern-Growth Methods. In: Aggarwal, C., Han, J. (eds) Frequent Pattern Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-07821-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-07821-2_3
Published: 30 August 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07820-5
Online ISBN: 978-3-319-07821-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics