From sequential pattern mining to structured pattern mining: A pattern-growth approach

Han, Jia-Wei; Pei, Jian; Yan, Xi-Feng

doi:10.1007/BF02944897

From sequential pattern mining to structured pattern mining: A pattern-growth approach

Knowledge and Data Processing
Published: May 2004

Volume 19, pages 257–279, (2004)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Jia-Wei Han¹,
Jian Pei² &
Xi-Feng Yan¹

363 Accesses
Explore all metrics

Abstract

Sequential pattern mining is an important data mining problem with broad applications. However, it is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of sequential pattern mining methods: (1) acandidate generation-and-test approach, represented by (i) GSP, a horizontal format-based sequential pattern mining method, and (ii) SPADE, a vertical format-based method; and (2) apattern-growth method, represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns.

In this study, we perform a systematic introduction and presentation of the pattern-growth methodology and study its principles and extensions. We first introduce two interesting pattern-growth algorithms, FreeSpan and PrefixSpan, for efficient sequential pattern mining. Then we introduce gSpan for mining structured patterns using the same methodology. Their relative performance in large databases is presented and analyzed. Several extensions of these methods are also discussed in the paper, including mining multi-level, multi-dimensional patterns and mining constraint-based patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pattern Mining: Current Challenges and Opportunities

An Efficient Approach for Mining Sequential Pattern

A sequential tree approach for incremental sequential pattern mining

Article 25 November 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Agrawal R, Srikant R. Mining sequential patterns. InProc. 1995 Int. Conf. Data Engineering (ICDE'95), Taipei, Taiwan, Mar. 1995, pp.3–14.
Srikant R, Agrawal R. Mining sequential patterns: Generalizations and performance improvements. InProc. 5th Int. Conf. Extending Database Technology (EDBT'96), Avignon, France, Mar. 1996, pp.3–17.
Mannila H, Toivonen H, Verkamo A I. Discovery of frequent episodes in event sequences.Data Mining and Knowledge Discovery, 1997, 1: 259–289.
Article Google Scholar
Wang J, Chirn G, Marr T, Shapiro B, Shasha D, Zhang K. Combinatorial pattern discovery for scientific data: Some preliminary results. InProc. 1994 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'94), Minneapolis, MN, May, 1994, pp.115–125.
Bettini C, Wang X S, Jajodia S. Mining temporal relationships with multiple granularities in time sequences.Data Engineering Bulletin, 1998, 21: 32–38.
Google Scholar
Zaki M J. Efficient enumeration of frequent sequences. InProc. 7th Int. Conf. Information and Knowledge Management (CIKM'98), Washington D.C., Nov. 1998, pp.68–75.
Masseglia F, Cathala F, Poncelet P. The psp approach for mining sequential patterns. InProc. 1998 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD'98), Nantes, France, Sept. 1998, pp.176–184.
Lu H, Han J, Feng L. Stock movement andn-dimensional inter-transaction association rules. InProc. 1998 SIGMOD Workshop Research Issues on Data Mining and Knowledge Discovery (DMKD'98), Seattle, WA, June 1998, pp.12:1–12:7.
Özden B, Ramaswamy S, Silberschatz A. Cyclic association rules. InProc. 1998 Int. Conf. Data Engineering (ICDE'98), Orlando, FL, Feb. 1998, pp.412–421.
Han J, Dong G, Yin Y. Efficient mining of partial periodic patterns in time series database. InProc. 1999 Int. Conf. Data Engineering (ICDE'99), Sydney, Australia, April 1999, pp.106–115.
Ramaswamy S, Mahajan S, Silberschatz A. On the discovery of interesting patterns in association rules. InProc. 1998 Int. Conf. Very Large Data Bases (VLDB'98), New York, NY, Aug. 1998, pp.368–379.
Guha S, rastogi R, Shim K. Rock: A robust clustering algorithm for categorical attributes. InProc. 1999 Int. Conf. Data Engineering (ICDE'99), Sydney, Australia, Mar. 1999, pp.512–521.
Zaki M. SPADE: An efficient algorithm for mining frequent sequences.Machine Learning, 2001, 40: 31–60.
Article Google Scholar
Zaki M J, Hsiao C J. CHARM: An efficient algorithm for closed itemset mining. InProc. 2002 SIAM Int. Conf. Data Mining (SDM'02), Arlington, VA, April 2002, pp.457–473.
Agrawal R, Srikant R. Fast algorithms for mining association rules. InProc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), Santiago, Chile, Sept. 1994, pp.487–499.
Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M C. FreeSpan: Frequent pattern-projected sequential pattern mining. InProc. 2000 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'00), Boston, MA, Aug. 2000, pp.355–359.
Inokuchi A, Washio T, Motoda H. An apriori-based algorithm for mining frequent substructures from graph data. InProc. 2000 European Symp. Principle of Data Mining and knowledge Discovery (PKDD'00), Lyon, France, Sept. 1998, pp.13–23.
Kuramochi M, Karypis G. Frequent subgraph discovery. InProc. 2001 Int. Conf. Data Mining (ICDM'01), San Jose, CA, Nov. 2001, pp.313–320.
Vanetik N, Gudes E, Shimony S E. Computing frequent graph patterns from semistructured data. InProc. 2002 Int. Conf. Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002, pp.458–465.
Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. InProc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'00), Dallas, TX, May 2000, pp.1–12.
Cormen T, Leiserson C, Rivest R, Stein C. Introduction to Algorithms, 2nd ed. The MIT Press, Cambridge, MA, 2001.
MATH Google Scholar
Yan X, Han J. gSpan: Graph-based substructure pattern mining. InUIUC-CS Tech. Report: R-2002-2296, A 4-page short version published inProc. 2002 Int. Conf. Data Mining (ICDM'02), Maebashi, Japan, 2002, pp.721–724.
Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z. KDD-Cup 2000 organizers' report: Peeling the onion.SIGKDD Explorations, 2000, 2: 86–98.
Article Google Scholar
Han J, Fu Y. Discovery of multiple-level association rules from large databases. InProc. 1995 Int. Conf. Very Large Data Bases (VLDB'95), Zurich, Switzerland, Sept. 1995, pp.420–431.
Kamber M, Han J, Chiang J Y. Metarule-guided mining of multi-dimensional association rules using data cubes. InProc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD'97), Newport Beach, CA, Aug. 1997, pp.207–210.
Grahne G, Lakshmanan L V S, Wang X, Xie M H. On dual mining: From patterns to circumstances, and back. InProc. 2001 Int. Conf. Data Engineering (ICDE'01), Heidelberg, Germany, April 2001, pp.195–204.
Pinto H, Han J, Pei J, Wang K, Chen Q, Dayal U. Multidimensional sequential pattern mining. InProc. 2001 Int. Conf. Information and Knowiedge Management (CIKM'01), Atlanta, GA, Nov. 2001, pp.81–88.
Beyer K, Ramakrishnan R. Bottom-up computation of sparse and iceberg cubes. InProc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), Philadelphia, PA, June 1999, pp.359–370.
Han J, Pei J, Dong G, Wang K. Efficient computation of iceberg cubes with complex measures. InProc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA, May 2001, pp.1–12.
Ng R, Lakshmanan L V S, Han J, Pang A. Exploratory mining and pruning optimizations of constrained associations rules. InProc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98), Seattle, WA, June 1998, pp.13–24.
Bayardo R J, Agrawal R, Gunopulos D. Constraint-based rule mining on large, dense data sets. InProc. 1999 Int. Conf. Data Engineering (ICDE'99), Sydney, Australia, April 1999, 188–197.
Pei J, Han J, Lakshmanan L V S. Mining frequent itemsets with convertible constraints. InProc. 2001 Int. Conf. Data Engineering (ICDE'01), Heidelberg, Germany, April 2001, pp.433–442.
Pei J, Han J, Wang W. Constraint-based sequential pattern mining in large databases. InProc. 2002 Int. Conf. Information and Knowledge Management (CIKM'02), McLean, VA, Nov. 2002, pp.18–25.
Asai T, Abe K, Kawasoe S, Arimura H, Satamoto H, Arikawa S. Efficient substructure discovery from large semi-structured data. InProc. 2002 SIAM Int. Conf. Data Mining (SDM'02), Part III, Arlington, VA, April 2002.
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M C. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. InProc. 2001 Int. Conf. Data Engineering (ICDE'01), Heidelberg, Germany, April 2001, pp.215–224.
Yan X, Han J. gSpan: Graph-based substructure pattern mining. InProc. 2002 Int. Conf. Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002, pp.721–724.
Yan X, Han J. CloseGraph: Mining closed frequent graph patterns. InProc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'03), Washington D.C., Aug. 2003.

Download references

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, 61801, Urbana, IL, U.S.A.
Jia-Wei Han & Xi-Feng Yan
State University of New York at Buffalo, 14260-2000, Buffalo, NY, U.S.A.
Jian Pei

Authors

Jia-Wei Han
View author publications
You can also search for this author in PubMed Google Scholar
Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar
Xi-Feng Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia-Wei Han.

Additional information

Survey

The work was supported in part by the Natural Sciences and Engineering Research Council of Canada, the Networks of Centres of Excellence of Canada, the Hewlett-Packard Lab, the U.S. National Science Foundation (Grant Nos. NSF IIS-02-09199, NSF IIS-03-08001), and the University of Illinois. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

Jia-Wei Han is a professor in computer science, University of Illinois at Urbana-Champaign. He has been working on research into data mining, data warehousing, database systems, with over 250 conference and journal publications. He has chaired or served on the PCs in many international conferences, including ACM SIGKDD, ACM SIGMOD, VLDB, ICDE, ICDM, SDM, and EDBT. He also served or is serving on the editorial boards for Data Mining and Knowledge Discovery, IEEE Transactions on Knowledge and Data Engineering, Journal of Intelligent Information Systems, and Journal of Computer Science and Technology. He is currently serving on the Board of Directors for the Executive Committee of ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). Jiawei has received the Outstanding Contribution Award at the 2002 ICDM, ACM Service Award, and IBM Faculty Awards. He is an ACM Fellow and the first author of the textbook “Data Mining: Concepts and Techniques” (Morgan Kaufmann, 2001).

Jian Pei received the B. Eng. and the M. Eng. degrees, both in computer science, from Shanghai Jiaotong University, China, in 1991 and 1993, respectively, and the Ph.D. degree in computing science from Simon Fraser University, Canada, in 2002. He was a Ph.D. candidate in Peking University in 1997–1999.

He is currently and Assistant professor of computer Science and engineering, the State University of New York at Buffalo, USA. He is a participating faculty in the Center of Unified Biometrics and Sensors (CUBS), at State University of New York at Buffalo. His research interests include data mining, data warehousing, online analytical processing, database systems, and bioinformatics. His current research is supported in part by the National Science Foundation (NSF).

He has published over 40 research papers in refereed journals, conferences, and workshops, served in the program committees of over 30 international conferences and workshops, and been a reviewer for some leading academic journals. He is a member of the ACM, the ACM SIGMOD, the ACM SIGKDD and the IEEE Computer Society, and a guest area editor of the Journal of Computer Science and Technology.

Xi-Feng Yan received a B.E. degree in computer engineering from Zhejiang University, China, in 1997, and an M.S. degree in computer science from the State University of New York at Stony Brook, NY, in 2001. He is currently a Ph.D. candidate in the Department of Computer Science at the University of Illinois at Urbana-Champaign. His research interests include data mining, structural/graph pattern mining, and their applications in database systems and bioinformatics.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, JW., Pei, J. & Yan, XF. From sequential pattern mining to structured pattern mining: A pattern-growth approach. J. Comput. Sci. & Technol. 19, 257–279 (2004). https://doi.org/10.1007/BF02944897

Download citation

Received: 19 January 2004
Issue Date: May 2004
DOI: https://doi.org/10.1007/BF02944897

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From sequential pattern mining to structured pattern mining: A pattern-growth approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Pattern Mining: Current Challenges and Opportunities

An Efficient Approach for Mining Sequential Pattern

A sequential tree approach for incremental sequential pattern mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

From sequential pattern mining to structured pattern mining: A pattern-growth approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Pattern Mining: Current Challenges and Opportunities

An Efficient Approach for Mining Sequential Pattern

A sequential tree approach for incremental sequential pattern mining

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation