Skip to main content
Log in

Efficient frequent sequence mining by a dynamic strategy switching algorithm

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Mining frequent sequences in large databases has been an important research topic. The main challenge of mining frequent sequences is the high processing cost due to the large amount of data. In this paper, we propose a novel strategy to find all the frequent sequences without having to compute the support counts of non-frequent sequences. The previous works prune candidate sequences based on the frequent sequences with shorter lengths, while our strategy prunes candidate sequences according to the non-frequent sequences with the same lengths. As a result, our strategy can cooperate with the previous works to achieve a better performance. We then identify three major strategies used in the previous works and combine them with our strategy into an efficient algorithm. The novelty of our algorithm lies in its ability to dynamically switch from a previous strategy to our new strategy in the mining process for a better performance. Experiment results show that our algorithm outperforms the previous ones under various parameter settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Srikant, R.: Fast Algorithm for Mining Association Rules. In: Proc. of International Conf. on Very Large Data Bases, pp. 487–499 (1994)

  2. Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proc. of IEEE International Conf. on Data Engineering, pp. 3–14 (1995)

  3. Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential Pattern Mining using A Bitmap Representation. In: Proc. of ACM Conf. on Knowledge Discovery and Data Mining (2002)

  4. Bonfield, J.K., Staden, R.: ZTR: A New Format for DNA Sequence Trace Data. Bioinformatics 18(1), 3–10 (2002)

    Article  Google Scholar 

  5. Chiu, D.Y., Wu, Y.H., Chen, A.L.P.: An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting. In: Proc. of IEEE International Conf. on Data Engineering, pp. 375–386 (2004)

  6. Cong, S., Han, J., Padua, D.: Parallel Mining of Closed Sequential Patterns. In: Proc. of ACM International Conf. on Knowledge Discovery in Data Mining, pp. 562–567 (2005)

  7. Garofalakis, M.N., Rastogi, R., Shim, K.: Mining Sequential Patterns with Regular Expression Constraints. IEEE Trans. Knowl. Data Eng. 14(3), 530–552 (2002)

    Article  Google Scholar 

  8. Han, J.W., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.C.: FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining. In: Proc. of ACM International Conf. on Knowledge Discovery and Data Mining, pp. 355–359 (2000)

  9. Ho, C.C., Li, H.F., Kuo, F.F., Lee, S.Y.: Incremental Mining of Sequential Patterns over a Stream Sliding Window. In: Proc. of IEEE International Conf. on Data Mining Workshops, pp. 677–681 (2006)

  10. Hsu, J.L., Liu, C.C., Chen, A.L.P.: Discovering Nontrivial Repeating Patterns in Music Data. IEEE Trans. Multimed. 3(3), 311–325 (2001)

    Article  Google Scholar 

  11. Lesh, N., Zaki, M.J., Ogihara, M.: Mining Features for Sequence Classification. In: Proc. of ACM International Conf. on Knowledge Discovery and Data Mining, pp. 342–346 (1999)

  12. Pei, J., Han, J.W., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc. of IEEE International Conf. on Data Engineering, pp. 215–224 (2001)

  13. Pei, J., Han, J.W., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: Mining Sequential Patterns by Pattern Growth: The PrefixSpan Approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)

    Article  Google Scholar 

  14. Pei, J., Han, J.W., Wang, W.: Mining Sequential Patterns with Constraints in Large Databases. In: Proc. of ACM Conf. on Information and Knowledge Management (2002)

  15. Pinto, H., Han, J.W., Pei, J., Wang, K., Chen, Q., Dayal, U.: Multi-Dimensional Sequential Pattern Mining. In: Proc. of ACM International Conf. Information and Knowledge Management, pp. 81–88 (2001)

  16. Raissi, C., Poncelet, P., Teisseire, M.: SPEED: Mining Maximal Sequential Patterns over Data Streams. In: Proc. of IEEE International Conf. on Intelligent Systems, pp. 546–552 (2006)

  17. Rolland, P.Y.: FlExPat: Flexible Extraction of Sequential Patterns. In: Proc. of IEEE International Conf. on Data Mining, pp. 481–488 (2001)

  18. She, C., Tang, J., Li, L., Wang, H., Fan, Z.: An Improved Parallel Algorithm for Sequence Mining. In: Proc. of the IEEE International Conf. on Mechatronics and Automation, pp. 1692–1696 (2005)

  19. Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Proc. of International Conf. on Extending Database Technology (1996)

  20. Weiss, M.A.: Data Structures and Algorithm Analysis in C, 2nd edn. Addison-Wesley, Reading (1997)

    Google Scholar 

  21. Wesselink, J.J., Iglesia, B. et al.: Determining a Unique Defining DNA Sequence for Yeast Species Using Hashing Techniques. Bioinformatics 18(7), 1004–1010 (2002)

    Article  Google Scholar 

  22. Wu, Y.H., Chen, A.L.P.: Prediction of Web Page Accesses by Proxy Server Log. World Wide Web: Internet Web Inf. Syst. 5(1), 67–88 (2002)

    Article  MATH  Google Scholar 

  23. Yang, J., Wang, W., Yu, P.S., Han, J.W.: Mining Long Sequential Patterns in a Noisy Environment. In: Proc. of ACM International Conf. on Management of Data (2002)

  24. Zaki, M.J.: Efficient Enumeration of Frequent Sequences. In: Proc. of ACM International Conf. on Information and Knowledge Management, pp. 68–75 (1998)

  25. Zaki, M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Mach. Learn. 42(1), 31–60 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arbee L. P. Chen.

Additional information

This paper is a major-value added version of the following paper: D. Y. Chiu, Y. H. Wu, A. L. P. Chen, “An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting,” Proceedings of IEEE Data Engineering Conference, pp. 375–386, 2004.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiu, DY., Wu, YH. & Chen, A.L.P. Efficient frequent sequence mining by a dynamic strategy switching algorithm. The VLDB Journal 18, 303–327 (2009). https://doi.org/10.1007/s00778-008-0100-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-008-0100-7

Keywords

Navigation