Skip to main content
Log in

From sequential pattern mining to structured pattern mining: A pattern-growth approach

  • Knowledge and Data Processing
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Sequential pattern mining is an important data mining problem with broad applications. However, it is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of sequential pattern mining methods: (1) acandidate generation-and-test approach, represented by (i) GSP, a horizontal format-based sequential pattern mining method, and (ii) SPADE, a vertical format-based method; and (2) apattern-growth method, represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns.

In this study, we perform a systematic introduction and presentation of the pattern-growth methodology and study its principles and extensions. We first introduce two interesting pattern-growth algorithms, FreeSpan and PrefixSpan, for efficient sequential pattern mining. Then we introduce gSpan for mining structured patterns using the same methodology. Their relative performance in large databases is presented and analyzed. Several extensions of these methods are also discussed in the paper, including mining multi-level, multi-dimensional patterns and mining constraint-based patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal R, Srikant R. Mining sequential patterns. InProc. 1995 Int. Conf. Data Engineering (ICDE'95), Taipei, Taiwan, Mar. 1995, pp.3–14.

  2. Srikant R, Agrawal R. Mining sequential patterns: Generalizations and performance improvements. InProc. 5th Int. Conf. Extending Database Technology (EDBT'96), Avignon, France, Mar. 1996, pp.3–17.

  3. Mannila H, Toivonen H, Verkamo A I. Discovery of frequent episodes in event sequences.Data Mining and Knowledge Discovery, 1997, 1: 259–289.

    Article  Google Scholar 

  4. Wang J, Chirn G, Marr T, Shapiro B, Shasha D, Zhang K. Combinatorial pattern discovery for scientific data: Some preliminary results. InProc. 1994 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'94), Minneapolis, MN, May, 1994, pp.115–125.

  5. Bettini C, Wang X S, Jajodia S. Mining temporal relationships with multiple granularities in time sequences.Data Engineering Bulletin, 1998, 21: 32–38.

    Google Scholar 

  6. Zaki M J. Efficient enumeration of frequent sequences. InProc. 7th Int. Conf. Information and Knowledge Management (CIKM'98), Washington D.C., Nov. 1998, pp.68–75.

  7. Masseglia F, Cathala F, Poncelet P. The psp approach for mining sequential patterns. InProc. 1998 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD'98), Nantes, France, Sept. 1998, pp.176–184.

  8. Lu H, Han J, Feng L. Stock movement andn-dimensional inter-transaction association rules. InProc. 1998 SIGMOD Workshop Research Issues on Data Mining and Knowledge Discovery (DMKD'98), Seattle, WA, June 1998, pp.12:1–12:7.

  9. Özden B, Ramaswamy S, Silberschatz A. Cyclic association rules. InProc. 1998 Int. Conf. Data Engineering (ICDE'98), Orlando, FL, Feb. 1998, pp.412–421.

  10. Han J, Dong G, Yin Y. Efficient mining of partial periodic patterns in time series database. InProc. 1999 Int. Conf. Data Engineering (ICDE'99), Sydney, Australia, April 1999, pp.106–115.

  11. Ramaswamy S, Mahajan S, Silberschatz A. On the discovery of interesting patterns in association rules. InProc. 1998 Int. Conf. Very Large Data Bases (VLDB'98), New York, NY, Aug. 1998, pp.368–379.

  12. Guha S, rastogi R, Shim K. Rock: A robust clustering algorithm for categorical attributes. InProc. 1999 Int. Conf. Data Engineering (ICDE'99), Sydney, Australia, Mar. 1999, pp.512–521.

  13. Zaki M. SPADE: An efficient algorithm for mining frequent sequences.Machine Learning, 2001, 40: 31–60.

    Article  Google Scholar 

  14. Zaki M J, Hsiao C J. CHARM: An efficient algorithm for closed itemset mining. InProc. 2002 SIAM Int. Conf. Data Mining (SDM'02), Arlington, VA, April 2002, pp.457–473.

  15. Agrawal R, Srikant R. Fast algorithms for mining association rules. InProc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), Santiago, Chile, Sept. 1994, pp.487–499.

  16. Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M C. FreeSpan: Frequent pattern-projected sequential pattern mining. InProc. 2000 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'00), Boston, MA, Aug. 2000, pp.355–359.

  17. Inokuchi A, Washio T, Motoda H. An apriori-based algorithm for mining frequent substructures from graph data. InProc. 2000 European Symp. Principle of Data Mining and knowledge Discovery (PKDD'00), Lyon, France, Sept. 1998, pp.13–23.

  18. Kuramochi M, Karypis G. Frequent subgraph discovery. InProc. 2001 Int. Conf. Data Mining (ICDM'01), San Jose, CA, Nov. 2001, pp.313–320.

  19. Vanetik N, Gudes E, Shimony S E. Computing frequent graph patterns from semistructured data. InProc. 2002 Int. Conf. Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002, pp.458–465.

  20. Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. InProc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'00), Dallas, TX, May 2000, pp.1–12.

  21. Cormen T, Leiserson C, Rivest R, Stein C. Introduction to Algorithms, 2nd ed. The MIT Press, Cambridge, MA, 2001.

    MATH  Google Scholar 

  22. Yan X, Han J. gSpan: Graph-based substructure pattern mining. InUIUC-CS Tech. Report: R-2002-2296, A 4-page short version published inProc. 2002 Int. Conf. Data Mining (ICDM'02), Maebashi, Japan, 2002, pp.721–724.

  23. Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z. KDD-Cup 2000 organizers' report: Peeling the onion.SIGKDD Explorations, 2000, 2: 86–98.

    Article  Google Scholar 

  24. Han J, Fu Y. Discovery of multiple-level association rules from large databases. InProc. 1995 Int. Conf. Very Large Data Bases (VLDB'95), Zurich, Switzerland, Sept. 1995, pp.420–431.

  25. Kamber M, Han J, Chiang J Y. Metarule-guided mining of multi-dimensional association rules using data cubes. InProc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD'97), Newport Beach, CA, Aug. 1997, pp.207–210.

  26. Grahne G, Lakshmanan L V S, Wang X, Xie M H. On dual mining: From patterns to circumstances, and back. InProc. 2001 Int. Conf. Data Engineering (ICDE'01), Heidelberg, Germany, April 2001, pp.195–204.

  27. Pinto H, Han J, Pei J, Wang K, Chen Q, Dayal U. Multidimensional sequential pattern mining. InProc. 2001 Int. Conf. Information and Knowiedge Management (CIKM'01), Atlanta, GA, Nov. 2001, pp.81–88.

  28. Beyer K, Ramakrishnan R. Bottom-up computation of sparse and iceberg cubes. InProc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), Philadelphia, PA, June 1999, pp.359–370.

  29. Han J, Pei J, Dong G, Wang K. Efficient computation of iceberg cubes with complex measures. InProc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA, May 2001, pp.1–12.

  30. Ng R, Lakshmanan L V S, Han J, Pang A. Exploratory mining and pruning optimizations of constrained associations rules. InProc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98), Seattle, WA, June 1998, pp.13–24.

  31. Bayardo R J, Agrawal R, Gunopulos D. Constraint-based rule mining on large, dense data sets. InProc. 1999 Int. Conf. Data Engineering (ICDE'99), Sydney, Australia, April 1999, 188–197.

  32. Pei J, Han J, Lakshmanan L V S. Mining frequent itemsets with convertible constraints. InProc. 2001 Int. Conf. Data Engineering (ICDE'01), Heidelberg, Germany, April 2001, pp.433–442.

  33. Pei J, Han J, Wang W. Constraint-based sequential pattern mining in large databases. InProc. 2002 Int. Conf. Information and Knowledge Management (CIKM'02), McLean, VA, Nov. 2002, pp.18–25.

  34. Asai T, Abe K, Kawasoe S, Arimura H, Satamoto H, Arikawa S. Efficient substructure discovery from large semi-structured data. InProc. 2002 SIAM Int. Conf. Data Mining (SDM'02), Part III, Arlington, VA, April 2002.

  35. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M C. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. InProc. 2001 Int. Conf. Data Engineering (ICDE'01), Heidelberg, Germany, April 2001, pp.215–224.

  36. Yan X, Han J. gSpan: Graph-based substructure pattern mining. InProc. 2002 Int. Conf. Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002, pp.721–724.

  37. Yan X, Han J. CloseGraph: Mining closed frequent graph patterns. InProc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'03), Washington D.C., Aug. 2003.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia-Wei Han.

Additional information

Survey

The work was supported in part by the Natural Sciences and Engineering Research Council of Canada, the Networks of Centres of Excellence of Canada, the Hewlett-Packard Lab, the U.S. National Science Foundation (Grant Nos. NSF IIS-02-09199, NSF IIS-03-08001), and the University of Illinois. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

Jia-Wei Han is a professor in computer science, University of Illinois at Urbana-Champaign. He has been working on research into data mining, data warehousing, database systems, with over 250 conference and journal publications. He has chaired or served on the PCs in many international conferences, including ACM SIGKDD, ACM SIGMOD, VLDB, ICDE, ICDM, SDM, and EDBT. He also served or is serving on the editorial boards for Data Mining and Knowledge Discovery, IEEE Transactions on Knowledge and Data Engineering, Journal of Intelligent Information Systems, and Journal of Computer Science and Technology. He is currently serving on the Board of Directors for the Executive Committee of ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). Jiawei has received the Outstanding Contribution Award at the 2002 ICDM, ACM Service Award, and IBM Faculty Awards. He is an ACM Fellow and the first author of the textbook “Data Mining: Concepts and Techniques” (Morgan Kaufmann, 2001).

Jian Pei received the B. Eng. and the M. Eng. degrees, both in computer science, from Shanghai Jiaotong University, China, in 1991 and 1993, respectively, and the Ph.D. degree in computing science from Simon Fraser University, Canada, in 2002. He was a Ph.D. candidate in Peking University in 1997–1999.

He is currently and Assistant professor of computer Science and engineering, the State University of New York at Buffalo, USA. He is a participating faculty in the Center of Unified Biometrics and Sensors (CUBS), at State University of New York at Buffalo. His research interests include data mining, data warehousing, online analytical processing, database systems, and bioinformatics. His current research is supported in part by the National Science Foundation (NSF).

He has published over 40 research papers in refereed journals, conferences, and workshops, served in the program committees of over 30 international conferences and workshops, and been a reviewer for some leading academic journals. He is a member of the ACM, the ACM SIGMOD, the ACM SIGKDD and the IEEE Computer Society, and a guest area editor of the Journal of Computer Science and Technology.

Xi-Feng Yan received a B.E. degree in computer engineering from Zhejiang University, China, in 1997, and an M.S. degree in computer science from the State University of New York at Stony Brook, NY, in 2001. He is currently a Ph.D. candidate in the Department of Computer Science at the University of Illinois at Urbana-Champaign. His research interests include data mining, structural/graph pattern mining, and their applications in database systems and bioinformatics.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, JW., Pei, J. & Yan, XF. From sequential pattern mining to structured pattern mining: A pattern-growth approach. J. Comput. Sci. & Technol. 19, 257–279 (2004). https://doi.org/10.1007/BF02944897

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02944897

Keywords

Navigation