Abstract
Mining sequential patterns is an important issue for the complexity of temporal pattern discovering from sequences. Current mining approaches either require many times of database scanning or generate several intermediate databases. As databases may fit into the ever-increasing main memory, efficient memory-based discovery of sequential patterns will become possible. In this paper, we propose a memory indexing approach for fast sequential pattern mining, named MEMISP. During the whole process, MEMISP scans the sequence database only once for reading data sequences into memory. The findthen- index technique recursively finds the items which constitute a frequent sequence and constructs a compact index set which indicates the set of data sequences for further exploration. Through effective index advancing, fewer and shorter data sequences need to be processed in MEMISP as the discovered patterns getting longer. Moreover, the maximum size of total memory required, which is independent of minimum support threshold in MEMISP, can be estimated. The experiments indicates that MEMISP outperforms both GSP and PrefixSpan algorithms. MEMISP also has good linear scalability even with very low minimum support. When the database is too large to fit in memory in a batch, we partition the database, mine patterns in each partition, and validate the true patterns in the second pass of database scanning. Therefore, MEMISP may efficiently mine databases of any size, for any minimum support values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, R. C., Aggarwal, C. C., Prasad, V. V. V.: Depth First Generation of Long Patterns. Proce. 6th SIGKDD (2000) 108–118
Agrawal, R., Srikant, R.: Mining Sequential Patterns. Proc. 11th ICDE (1995) 3–14
Garofalakis, M. N., Rastogi, R., Shim, K.: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. Proc. 25th VLDB (1999) 223–234
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal U., Hsu, M.-C.: FreeSpan: Frequent Pattern-projected Sequential Pattern Mining. Proc. 6th SIGKDD (2000) 355–359
Lin, M. Y., Lee, S. Y.: Incremental Update on Sequential Patterns in Large Databases. Proc. 10th ICTAI (1998) 24–31
Masseglia, F., Cathala, F., Poncelet, P.: The PSP Approach for Mining Sequential Patterns., Proc. 2nd Euro. Symp. PKDD (1998) 176–184
Parthasarathy, S., Zaki, M. J., Ogihara,.M., Dwarkadas, S.: Incremental and Interactive Sequence Mining. Proc. 8th CIKM (1999) 251–258
Pei, J., Han, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-projected Pattern Growth., Proc. 2001 ICDE (2001) 215–224
Shintani, T., Kitsuregawa, M.: Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach. Proc. 2nd PAKDD (1998) 283–294
Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. Proc. 5th EDBT (1996) 3–17
Thomas, S., Sarawagi, S.: Mining Generalized Association Rules and Sequential Patterns Using SQL Queries. Proc. 4th SIGKDD (1998) 344–348
Zaki, M. J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, Vol. 42, No. 1/2, (2001) 31–60
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, MY., Lee, SY. (2002). Fast Discovery of Sequential Patterns by Memory Indexing. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_15
Download citation
DOI: https://doi.org/10.1007/3-540-46145-0_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive