Fast Discovery of Sequential Patterns by Memory Indexing

Lin, Ming-Yen; Lee, Suh-Yin

doi:10.1007/3-540-46145-0_15

Ming-Yen Lin⁷ &
Suh-Yin Lee⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1269 Accesses
25 Citations

Abstract

Mining sequential patterns is an important issue for the complexity of temporal pattern discovering from sequences. Current mining approaches either require many times of database scanning or generate several intermediate databases. As databases may fit into the ever-increasing main memory, efficient memory-based discovery of sequential patterns will become possible. In this paper, we propose a memory indexing approach for fast sequential pattern mining, named MEMISP. During the whole process, MEMISP scans the sequence database only once for reading data sequences into memory. The findthen- index technique recursively finds the items which constitute a frequent sequence and constructs a compact index set which indicates the set of data sequences for further exploration. Through effective index advancing, fewer and shorter data sequences need to be processed in MEMISP as the discovered patterns getting longer. Moreover, the maximum size of total memory required, which is independent of minimum support threshold in MEMISP, can be estimated. The experiments indicates that MEMISP outperforms both GSP and PrefixSpan algorithms. MEMISP also has good linear scalability even with very low minimum support. When the database is too large to fit in memory in a batch, we partition the database, mine patterns in each partition, and validate the true patterns in the second pass of database scanning. Therefore, MEMISP may efficiently mine databases of any size, for any minimum support values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, R. C., Aggarwal, C. C., Prasad, V. V. V.: Depth First Generation of Long Patterns. Proce. 6th SIGKDD (2000) 108–118
Google Scholar
Agrawal, R., Srikant, R.: Mining Sequential Patterns. Proc. 11th ICDE (1995) 3–14
Google Scholar
Garofalakis, M. N., Rastogi, R., Shim, K.: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. Proc. 25th VLDB (1999) 223–234
Google Scholar
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal U., Hsu, M.-C.: FreeSpan: Frequent Pattern-projected Sequential Pattern Mining. Proc. 6th SIGKDD (2000) 355–359
Google Scholar
Lin, M. Y., Lee, S. Y.: Incremental Update on Sequential Patterns in Large Databases. Proc. 10th ICTAI (1998) 24–31
Google Scholar
Masseglia, F., Cathala, F., Poncelet, P.: The PSP Approach for Mining Sequential Patterns., Proc. 2nd Euro. Symp. PKDD (1998) 176–184
Google Scholar
Parthasarathy, S., Zaki, M. J., Ogihara,.M., Dwarkadas, S.: Incremental and Interactive Sequence Mining. Proc. 8th CIKM (1999) 251–258
Google Scholar
Pei, J., Han, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-projected Pattern Growth., Proc. 2001 ICDE (2001) 215–224
Google Scholar
Shintani, T., Kitsuregawa, M.: Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach. Proc. 2nd PAKDD (1998) 283–294
Google Scholar
Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. Proc. 5th EDBT (1996) 3–17
Google Scholar
Thomas, S., Sarawagi, S.: Mining Generalized Association Rules and Sequential Patterns Using SQL Queries. Proc. 4th SIGKDD (1998) 344–348
Google Scholar
Zaki, M. J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, Vol. 42, No. 1/2, (2001) 31–60
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Chiao Tung University, 30050, Taiwan, ROC
Ming-Yen Lin & Suh-Yin Lee

Authors

Ming-Yen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Suh-Yin Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, 606-8501, Kyoto, Japan
Yahiko Kambayashi
Institute for Computer Science and Business Informatics, University of Vienna, Liebiggasse 4, 1010, Vienna, Austria
Werner Winiwarter
Center for Spatial Information Science (CSIS), University of Tokyo, 4-6-1, Komaba, Meguro-ku, 153-8904, Tokyo, Japan
Masatoshi Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, MY., Lee, SY. (2002). Fast Discovery of Sequential Patterns by Memory Indexing. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_15

Download citation

DOI: https://doi.org/10.1007/3-540-46145-0_15
Published: 02 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics