Abstract
Mining frequent sequences in large databases has been an important research topic. The main challenge of mining frequent sequences is the high processing cost due to the large amount of data. In this paper, we propose a novel strategy to find all the frequent sequences without having to compute the support counts of non-frequent sequences. The previous works prune candidate sequences based on the frequent sequences with shorter lengths, while our strategy prunes candidate sequences according to the non-frequent sequences with the same lengths. As a result, our strategy can cooperate with the previous works to achieve a better performance. We then identify three major strategies used in the previous works and combine them with our strategy into an efficient algorithm. The novelty of our algorithm lies in its ability to dynamically switch from a previous strategy to our new strategy in the mining process for a better performance. Experiment results show that our algorithm outperforms the previous ones under various parameter settings.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Fast Algorithm for Mining Association Rules. In: Proc. of International Conf. on Very Large Data Bases, pp. 487–499 (1994)
Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proc. of IEEE International Conf. on Data Engineering, pp. 3–14 (1995)
Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential Pattern Mining using A Bitmap Representation. In: Proc. of ACM Conf. on Knowledge Discovery and Data Mining (2002)
Bonfield, J.K., Staden, R.: ZTR: A New Format for DNA Sequence Trace Data. Bioinformatics 18(1), 3–10 (2002)
Chiu, D.Y., Wu, Y.H., Chen, A.L.P.: An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting. In: Proc. of IEEE International Conf. on Data Engineering, pp. 375–386 (2004)
Cong, S., Han, J., Padua, D.: Parallel Mining of Closed Sequential Patterns. In: Proc. of ACM International Conf. on Knowledge Discovery in Data Mining, pp. 562–567 (2005)
Garofalakis, M.N., Rastogi, R., Shim, K.: Mining Sequential Patterns with Regular Expression Constraints. IEEE Trans. Knowl. Data Eng. 14(3), 530–552 (2002)
Han, J.W., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.C.: FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining. In: Proc. of ACM International Conf. on Knowledge Discovery and Data Mining, pp. 355–359 (2000)
Ho, C.C., Li, H.F., Kuo, F.F., Lee, S.Y.: Incremental Mining of Sequential Patterns over a Stream Sliding Window. In: Proc. of IEEE International Conf. on Data Mining Workshops, pp. 677–681 (2006)
Hsu, J.L., Liu, C.C., Chen, A.L.P.: Discovering Nontrivial Repeating Patterns in Music Data. IEEE Trans. Multimed. 3(3), 311–325 (2001)
Lesh, N., Zaki, M.J., Ogihara, M.: Mining Features for Sequence Classification. In: Proc. of ACM International Conf. on Knowledge Discovery and Data Mining, pp. 342–346 (1999)
Pei, J., Han, J.W., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc. of IEEE International Conf. on Data Engineering, pp. 215–224 (2001)
Pei, J., Han, J.W., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: Mining Sequential Patterns by Pattern Growth: The PrefixSpan Approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)
Pei, J., Han, J.W., Wang, W.: Mining Sequential Patterns with Constraints in Large Databases. In: Proc. of ACM Conf. on Information and Knowledge Management (2002)
Pinto, H., Han, J.W., Pei, J., Wang, K., Chen, Q., Dayal, U.: Multi-Dimensional Sequential Pattern Mining. In: Proc. of ACM International Conf. Information and Knowledge Management, pp. 81–88 (2001)
Raissi, C., Poncelet, P., Teisseire, M.: SPEED: Mining Maximal Sequential Patterns over Data Streams. In: Proc. of IEEE International Conf. on Intelligent Systems, pp. 546–552 (2006)
Rolland, P.Y.: FlExPat: Flexible Extraction of Sequential Patterns. In: Proc. of IEEE International Conf. on Data Mining, pp. 481–488 (2001)
She, C., Tang, J., Li, L., Wang, H., Fan, Z.: An Improved Parallel Algorithm for Sequence Mining. In: Proc. of the IEEE International Conf. on Mechatronics and Automation, pp. 1692–1696 (2005)
Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Proc. of International Conf. on Extending Database Technology (1996)
Weiss, M.A.: Data Structures and Algorithm Analysis in C, 2nd edn. Addison-Wesley, Reading (1997)
Wesselink, J.J., Iglesia, B. et al.: Determining a Unique Defining DNA Sequence for Yeast Species Using Hashing Techniques. Bioinformatics 18(7), 1004–1010 (2002)
Wu, Y.H., Chen, A.L.P.: Prediction of Web Page Accesses by Proxy Server Log. World Wide Web: Internet Web Inf. Syst. 5(1), 67–88 (2002)
Yang, J., Wang, W., Yu, P.S., Han, J.W.: Mining Long Sequential Patterns in a Noisy Environment. In: Proc. of ACM International Conf. on Management of Data (2002)
Zaki, M.J.: Efficient Enumeration of Frequent Sequences. In: Proc. of ACM International Conf. on Information and Knowledge Management, pp. 68–75 (1998)
Zaki, M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Mach. Learn. 42(1), 31–60 (2001)
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is a major-value added version of the following paper: D. Y. Chiu, Y. H. Wu, A. L. P. Chen, “An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting,” Proceedings of IEEE Data Engineering Conference, pp. 375–386, 2004.
Rights and permissions
About this article
Cite this article
Chiu, DY., Wu, YH. & Chen, A.L.P. Efficient frequent sequence mining by a dynamic strategy switching algorithm. The VLDB Journal 18, 303–327 (2009). https://doi.org/10.1007/s00778-008-0100-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-008-0100-7