Abstract
Knowledge in the form of generalized sequential patterns finds many applications. In this paper, we focus on optimizing GSP, which is a well-known algorithm for discovering such patterns. Our optimization consists in more selective identification of nodes to be visited while traversing a hash tree with candidates for generalized sequential patterns. It is based on the fact that elements of candidate sequences are stored as ordered sets of items. In order to reduce the number of visited nodes in the hash tree, we also propose to use not only parameters windowSize and maxGap as in original GSP, but also parameter minGap. As a result of our optimization, the number of candidates that require final time-consuming verification may be considerably decreased. In the experiments we have carried out, our optimized variant of GSP was several times faster than standard GSP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In our implementation of the hash tree, this restriction is kept for leaves at levels not exceeding |Flattened(S)|, but leaves at level |Flattened(S)| + 1 are allowed to store more than m candidate sequences.
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE 1995, pp. 3â14. IEEE Computer Society (1995)
Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: KDD 2002, pp. 429â435. ACM (2002)
Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54â77 (2017)
Garofalakis, M.N., Rastogi, R., Shim, K., SPIRIT: sequential pattern mining with regular expression constraints. VLDB J. 223â234 (1999)
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.: FreeSpan: frequent pattern-projected sequential pattern mining. In: KDD 2000, pp. 355â359. ACM (2000)
IBM Almaden Quest Research Group, Quest Synthetic Data Generator
Lin, M.Y., Lee, S.Y.: Fast discovery of sequential patterns by memory indexing. In: DaWaK 2002. LNCS, vol. 2454, pp. 150â160. Springer (2002)
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: mining sequential patterns by prefix-projected growth. In: ICDE 2001, pp. 215â224. IEEE Computer Society (2001)
Pei, J., Han, J., Wang, W.: Mining sequential patterns with constraints in large databases. In: CIKM 2002, pp. 18â25. ACM (2002)
Protaziuk, G., Kryszkiewicz, M., Rybinski, H., Delteil, A.: Discovering compound and proper nouns. In: RSEISP 2007. LNCS, vol. 4585, pp. 505â515. Springer (2007)
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: EDBT 1996. LNCS, vol. 1057, pp. 3â17. Springer (1996)
Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: ICDE 2004, pp. 79â90. IEEE Computer Society (2004)
Yan, X., Han, J., Afshar, R.: CloSpan: mining closed sequential patterns in large datasets. In: SDM 2003, pp. 166â177. SIAM (2003)
Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1/2), 31â60 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Kryszkiewicz, M., Skonieczny, Ć. (2019). Fast Discovery of Generalized Sequential Patterns. In: Bembenik, R., Skonieczny, Ć., Protaziuk, G., Kryszkiewicz, M., Rybinski, H. (eds) Intelligent Methods and Big Data in Industrial Applications. Studies in Big Data, vol 40. Springer, Cham. https://doi.org/10.1007/978-3-319-77604-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-77604-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77603-3
Online ISBN: 978-3-319-77604-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)