Skip to main content

Fast Discovery of Sequential Patterns by Memory Indexing

  • Conference paper
  • First Online:
Data Warehousing and Knowledge Discovery (DaWaK 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Included in the following conference series:

Abstract

Mining sequential patterns is an important issue for the complexity of temporal pattern discovering from sequences. Current mining approaches either require many times of database scanning or generate several intermediate databases. As databases may fit into the ever-increasing main memory, efficient memory-based discovery of sequential patterns will become possible. In this paper, we propose a memory indexing approach for fast sequential pattern mining, named MEMISP. During the whole process, MEMISP scans the sequence database only once for reading data sequences into memory. The findthen- index technique recursively finds the items which constitute a frequent sequence and constructs a compact index set which indicates the set of data sequences for further exploration. Through effective index advancing, fewer and shorter data sequences need to be processed in MEMISP as the discovered patterns getting longer. Moreover, the maximum size of total memory required, which is independent of minimum support threshold in MEMISP, can be estimated. The experiments indicates that MEMISP outperforms both GSP and PrefixSpan algorithms. MEMISP also has good linear scalability even with very low minimum support. When the database is too large to fit in memory in a batch, we partition the database, mine patterns in each partition, and validate the true patterns in the second pass of database scanning. Therefore, MEMISP may efficiently mine databases of any size, for any minimum support values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, R. C., Aggarwal, C. C., Prasad, V. V. V.: Depth First Generation of Long Patterns. Proce. 6th SIGKDD (2000) 108–118

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Mining Sequential Patterns. Proc. 11th ICDE (1995) 3–14

    Google Scholar 

  3. Garofalakis, M. N., Rastogi, R., Shim, K.: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. Proc. 25th VLDB (1999) 223–234

    Google Scholar 

  4. Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal U., Hsu, M.-C.: FreeSpan: Frequent Pattern-projected Sequential Pattern Mining. Proc. 6th SIGKDD (2000) 355–359

    Google Scholar 

  5. Lin, M. Y., Lee, S. Y.: Incremental Update on Sequential Patterns in Large Databases. Proc. 10th ICTAI (1998) 24–31

    Google Scholar 

  6. Masseglia, F., Cathala, F., Poncelet, P.: The PSP Approach for Mining Sequential Patterns., Proc. 2nd Euro. Symp. PKDD (1998) 176–184

    Google Scholar 

  7. Parthasarathy, S., Zaki, M. J., Ogihara,.M., Dwarkadas, S.: Incremental and Interactive Sequence Mining. Proc. 8th CIKM (1999) 251–258

    Google Scholar 

  8. Pei, J., Han, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-projected Pattern Growth., Proc. 2001 ICDE (2001) 215–224

    Google Scholar 

  9. Shintani, T., Kitsuregawa, M.: Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach. Proc. 2nd PAKDD (1998) 283–294

    Google Scholar 

  10. Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. Proc. 5th EDBT (1996) 3–17

    Google Scholar 

  11. Thomas, S., Sarawagi, S.: Mining Generalized Association Rules and Sequential Patterns Using SQL Queries. Proc. 4th SIGKDD (1998) 344–348

    Google Scholar 

  12. Zaki, M. J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, Vol. 42, No. 1/2, (2001) 31–60

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, MY., Lee, SY. (2002). Fast Discovery of Sequential Patterns by Memory Indexing. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_15

Download citation

  • DOI: https://doi.org/10.1007/3-540-46145-0_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44123-6

  • Online ISBN: 978-3-540-46145-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics