Skip to main content
Log in

Mining Sequential Patterns across Time Sequences

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

In this paper, we deal with mining sequential patterns in multiple time sequences. Building on a state-of-the-art sequential pattern mining algorithm PrefixSpan for mining transaction databases, we propose MILE (MIning in muLtiple sEquences), an efficient algorithm to facilitate the mining process. MILE recursively utilizes the knowledge of existing patterns to avoid redundant data scanning, and therefore can effectively speed up the new patterns’ discovery process. Another unique feature of MILE is that it can incorporate prior knowledge of the data distribution in time sequences into the mining process to further improve the performance. Extensive empirical results show that MILE is significantly faster than PrefixSpan. As MILE consumes more memory than PrefixSpan, we also present a solution to trade time efficiency in memory constrained environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal, R. and Srikant, R., “Mining Sequential Patterns,” in Proc. of the 11th Int’l Conf. on Data Engineering, pp. 3-14, 1995.

  2. Ayres, J., Flannick, J., Gehrke, J. and Yiu, T., “Sequential Pattern Mining Using a Bitmap Representation,” in Proc. of the 8th Int’l Conf. on Knowledge Discovery and Data Mining, pp. 429-435, 2002.

  3. Bettini, C., Wang, X.S., Jajodia, S. and Lin, J., “Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences”, IEEE Transactions on Knowledge and Data Engineering, 10- 2, pp. 222-237, 1998.

    Article  Google Scholar 

  4. Charikar,M., Chen, K. and Farach-Colton, M., “Finding Frequent Items in Data Streams,” in Proc. of Int’l Colloquium on Automata, Languages and Programming, pp. 508-515, 2002.

  5. Das, G., Lin, K., Mannila, H., Renganathan, G. and Smyth, P., “Rule Discovery from Time Series,” in Proc. of the 4th Int’l Conf. of Knowledge Discovery and Data Mining, pp. 16-22, 1998.

  6. Gao, L., and Wang, X.S., “Continually Evaluating Similarity-based Pattern Queries on a Streaming Time Series,” in Proc. of the 2002 ACM SIGMOD Int’l Conf. on Management of Data, pp. 370-381. 2002.

  7. Keogh, E. and Lin, J., “Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research,” Knowledge and Information Systems, 8- 2, pp. 154-177, 2005.

    Article  Google Scholar 

  8. Keogh, E., and Smyth, P., “A Probabilistic Approach to Fast Pattern Matching in Time Series Databases,” in Proc. of the 3rd Int’l Conf. of Knowledge Discovery and Data Mining, pp. 16-22, 1997.

  9. Manku,G. S., and Motwani, R., “Approximate Frequency Counts over Data Streams,” in Proc. of the 28th Int’l Conf. on Very Large Data Bases, pp. 346-357, 2002.

  10. Mannila, H., Toivonen, H. and Verkamo, A.I., “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, 1-3, pp. 259-289, 1997.

    Article  Google Scholar 

  11. Oates, T. and Cohen, P.R., “Searching for Structure in Multiple Streams of Data,” in Proc. of the 13th Int’l Conf. on Machine Learning, pp. 346-354, 1996.

  12. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H.,Chen, Q., Dayal, U. and Hsu, M., “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix Projected Pattern Growth,” in Proc. of the 17th Int’l Conf. on Data Engineering, pp. 215-226, 2001.

  13. Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U. and Hsu, M., “Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach,” IEEE Transactions on Knowledge and Data Engineering, 16-11, pp. 1424-1440, 2004.

    Google Scholar 

  14. Srikant, R. and Agrawal, R., “Mining Sequential Patterns: Generalized and Performance Improvements,” in Proc. of the 5th Int’l Conf. on Extending Database Technology, pp. 3-17, 1996.

  15. Wang, M. and Wang, X.S., “Efficient Evaluation of Composite Correlations for Streaming Time Series,” in Proc. of the 4th Int’l Conf. on Web-Age Information Management, pp. 369-380, 2003.

  16. Yang, Y., Webb, G. and Wu, X, “Discretization Methods,” in Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers (O. Maimon and L. Rokach eds.), Kluwer Academic Publishers, 2005.

  17. Yi, B., Sidiropoulos, N., Johnson, W., Jagadish, H.V., Faloutsos, C. and Biliris, A., “Online Data Mining for Co-Evolving Time Sequences,” in Proc. of the 16th Int’l Conf. on Data Engineering, pp. 13-22, 2000.

  18. Zaki, M. J., “Efficient Enumeration of Frequent Sequences,” in Proc. of the 7th Int’l Conf. on Information and Knowledge Management, pp. 68-75, 1998.

  19. Zaki, M. J., “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning, 42-1/2, pp. 31-60, 2001.

    Article  Google Scholar 

  20. Zhu, Y. and Shasha, D., “StartStream: Statistical Monitoring of Thousands of Data Streams in Real Time,” in Proc. of the 28th Int’l Conf. on Very Large Data Bases, pp. 358-369, 2002.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gong Chen.

About this article

Cite this article

Chen, G., Wu, X. & Zhu, X. Mining Sequential Patterns across Time Sequences. New Gener. Comput. 26, 75–96 (2007). https://doi.org/10.1007/s00354-007-0036-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-007-0036-2

Keywords

Navigation