Abstract
Frequent closed sequential pattern mining plays an important role in sequence data mining and has a wide range of applications in real life, such as protein sequence analysis, financial data investigation, and user behavior prediction. In previous studies, a user predefined gap constraint is considered in frequent closed sequential pattern mining as a parameter. However, it is difficult for users, who are lacking sufficient priori knowledge, to set suitable gap constraints. Furthermore, different gap constraints may lead to different results, and some useful patterns may be missed if the gap constraint is chosen inappropriately. To deal with this, we present a novel problem of mining frequent closed sequential patterns with non-user-defined gap constraints. In addition, we propose an efficient algorithm to find the frequent closed sequential patterns with the most suitable gap constraints. Our empirical study on protein data sets demonstrates that our algorithm is effective and efficient.
This work was supported in part by NSFC 61103042, SRFDP 20100181120029, SKLSE2012-09-32, and China Postdoctoral Science Foundation 2014M552371.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. of the 11th Int’l Conf. on Data Engineering, Taipei, Taiwan, pp. 3–14 (1995)
Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequences. Mach. Learn. 42(1-2), 31–60 (2001)
Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11(3), 259–286 (2007)
Yan, X., Han, J., Afshar, R.: CloSpan: Mining closed sequential patterns in large databases. In: Proc. of the 3rd SIAM Int’l Conf. on Data Mining, San Francisco, CA, USA, pp. 166–177 (2003)
Zhang, M., Kao, B., Cheung, D.W., Yip, K.Y.: Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2) (August 2007)
Ferreira, P.G., Azevedo, P.J.: Protein sequence pattern mining with constraints. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 96–107. Springer, Heidelberg (2005)
She, R., Chen, F., Wang, K., Ester, M., Gardy, J.L., Brinkman, F.S.L.: Frequent-subsequence-based prediction of outer membrane proteins. In: Proc. of the 9th ACM SIGKDD Int’l Conf. on Knowl. Discov. and Data Mining, pp. 436–445. ACM, New York (2003)
Wang, J., Han, J., Li, C.: Frequent closed sequence mining without candidate maintenance. IEEE Trans. on Knowl. and Data Engineering 19(8), 1042–1056 (2007)
Li, C., Yang, Q., Wang, J., Li, M.: Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data 6(1), 2:1–2:39 (2012)
He, H., Wang, D., Chen, G., Zhang, W.: An alert correlation analysis oriented incremental mining algorithm of closed sequential patterns with gap constraints. Appl. Math 8(1L), 41–46 (2014)
Wu, X., Zhu, X., He, Y., Arslan, A.N.: PMBC: Pattern mining from biological sequences with wildcard constraints. Comput. Biol. Med. 43(5), 481–492 (2013)
Xie, F., Wu, X., Hu, X., Gao, J., Guo, D., Fei, Y., Hua, E.: MAIL: Mining sequential patterns with wildcards. Int. J. Data Min. Bioinformatics 8(1), 1–23 (2013)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
Antunes, C., Oliveira, A.L.: Generalization of pattern-growth methods for sequential pattern mining with gap constraints. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 239–251. Springer, Heidelberg (2003)
Pei, J., Han, J., Mortazavi-asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-c.: PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proc. of the 17th Int’l Conf. on Data Engineering, ICDE 2001, pp. 215–224. IEEE Computer Society, Washington, DC (2001)
Shah, C.C., Zhu, X., Khoshgoftaar, T.M., Beyer, J.: Contrast pattern mining with gap constraints for peptide folding prediction. In: Proc. of the 21st Int’l FLAIRS Conf., Coconut Grove, Florida, USA, pp. 95–100 (2008)
Wang, X., Duan, L., Dong, G., Yu, Z., Tang, C.: Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part I. LNCS, vol. 8421, pp. 372–387. Springer, Heidelberg (2014)
Rymon, R.: Search through systematic set enumeration. In: Proc. of the 3rd Int’l Conf. on Principles of Knowl. Representation and Reasoning, pp. 539–550. Cambridge (1992)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, W. et al. (2014). Mining Frequent Closed Sequential Patterns with Non-user-defined Gap Constraints. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-14717-8_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14716-1
Online ISBN: 978-3-319-14717-8
eBook Packages: Computer ScienceComputer Science (R0)