Abstract
In this paper, we deal with mining sequential patterns in multiple time sequences. Building on a state-of-the-art sequential pattern mining algorithm PrefixSpan for mining transaction databases, we propose MILE (MIning in muLtiple sEquences), an efficient algorithm to facilitate the mining process. MILE recursively utilizes the knowledge of existing patterns to avoid redundant data scanning, and therefore can effectively speed up the new patterns’ discovery process. Another unique feature of MILE is that it can incorporate prior knowledge of the data distribution in time sequences into the mining process to further improve the performance. Extensive empirical results show that MILE is significantly faster than PrefixSpan. As MILE consumes more memory than PrefixSpan, we also present a solution to trade time efficiency in memory constrained environments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal, R. and Srikant, R., “Mining Sequential Patterns,” in Proc. of the 11th Int’l Conf. on Data Engineering, pp. 3-14, 1995.
Ayres, J., Flannick, J., Gehrke, J. and Yiu, T., “Sequential Pattern Mining Using a Bitmap Representation,” in Proc. of the 8th Int’l Conf. on Knowledge Discovery and Data Mining, pp. 429-435, 2002.
Bettini, C., Wang, X.S., Jajodia, S. and Lin, J., “Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences”, IEEE Transactions on Knowledge and Data Engineering, 10- 2, pp. 222-237, 1998.
Charikar,M., Chen, K. and Farach-Colton, M., “Finding Frequent Items in Data Streams,” in Proc. of Int’l Colloquium on Automata, Languages and Programming, pp. 508-515, 2002.
Das, G., Lin, K., Mannila, H., Renganathan, G. and Smyth, P., “Rule Discovery from Time Series,” in Proc. of the 4th Int’l Conf. of Knowledge Discovery and Data Mining, pp. 16-22, 1998.
Gao, L., and Wang, X.S., “Continually Evaluating Similarity-based Pattern Queries on a Streaming Time Series,” in Proc. of the 2002 ACM SIGMOD Int’l Conf. on Management of Data, pp. 370-381. 2002.
Keogh, E. and Lin, J., “Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research,” Knowledge and Information Systems, 8- 2, pp. 154-177, 2005.
Keogh, E., and Smyth, P., “A Probabilistic Approach to Fast Pattern Matching in Time Series Databases,” in Proc. of the 3rd Int’l Conf. of Knowledge Discovery and Data Mining, pp. 16-22, 1997.
Manku,G. S., and Motwani, R., “Approximate Frequency Counts over Data Streams,” in Proc. of the 28th Int’l Conf. on Very Large Data Bases, pp. 346-357, 2002.
Mannila, H., Toivonen, H. and Verkamo, A.I., “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, 1-3, pp. 259-289, 1997.
Oates, T. and Cohen, P.R., “Searching for Structure in Multiple Streams of Data,” in Proc. of the 13th Int’l Conf. on Machine Learning, pp. 346-354, 1996.
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H.,Chen, Q., Dayal, U. and Hsu, M., “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix Projected Pattern Growth,” in Proc. of the 17th Int’l Conf. on Data Engineering, pp. 215-226, 2001.
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U. and Hsu, M., “Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach,” IEEE Transactions on Knowledge and Data Engineering, 16-11, pp. 1424-1440, 2004.
Srikant, R. and Agrawal, R., “Mining Sequential Patterns: Generalized and Performance Improvements,” in Proc. of the 5th Int’l Conf. on Extending Database Technology, pp. 3-17, 1996.
Wang, M. and Wang, X.S., “Efficient Evaluation of Composite Correlations for Streaming Time Series,” in Proc. of the 4th Int’l Conf. on Web-Age Information Management, pp. 369-380, 2003.
Yang, Y., Webb, G. and Wu, X, “Discretization Methods,” in Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers (O. Maimon and L. Rokach eds.), Kluwer Academic Publishers, 2005.
Yi, B., Sidiropoulos, N., Johnson, W., Jagadish, H.V., Faloutsos, C. and Biliris, A., “Online Data Mining for Co-Evolving Time Sequences,” in Proc. of the 16th Int’l Conf. on Data Engineering, pp. 13-22, 2000.
Zaki, M. J., “Efficient Enumeration of Frequent Sequences,” in Proc. of the 7th Int’l Conf. on Information and Knowledge Management, pp. 68-75, 1998.
Zaki, M. J., “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning, 42-1/2, pp. 31-60, 2001.
Zhu, Y. and Shasha, D., “StartStream: Statistical Monitoring of Thousands of Data Streams in Real Time,” in Proc. of the 28th Int’l Conf. on Very Large Data Bases, pp. 358-369, 2002.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Chen, G., Wu, X. & Zhu, X. Mining Sequential Patterns across Time Sequences. New Gener. Comput. 26, 75–96 (2007). https://doi.org/10.1007/s00354-007-0036-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00354-007-0036-2