Fast Discovery of Generalized Sequential Patterns

Kryszkiewicz, Marzena; Skonieczny, Łukasz

doi:10.1007/978-3-319-77604-0_12

Marzena Kryszkiewicz⁷ &
Łukasz Skonieczny⁷

Part of the book series: Studies in Big Data ((SBD,volume 40))

888 Accesses

Abstract

Knowledge in the form of generalized sequential patterns finds many applications. In this paper, we focus on optimizing GSP, which is a well-known algorithm for discovering such patterns. Our optimization consists in more selective identification of nodes to be visited while traversing a hash tree with candidates for generalized sequential patterns. It is based on the fact that elements of candidate sequences are stored as ordered sets of items. In order to reduce the number of visited nodes in the hash tree, we also propose to use not only parameters windowSize and maxGap as in original GSP, but also parameter minGap. As a result of our optimization, the number of candidates that require final time-consuming verification may be considerably decreased. In the experiments we have carried out, our optimized variant of GSP was several times faster than standard GSP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In our implementation of the hash tree, this restriction is kept for leaves at levels not exceeding |Flattened(S)|, but leaves at level |Flattened(S)| + 1 are allowed to store more than m candidate sequences.

References

Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE 1995, pp. 3–14. IEEE Computer Society (1995)
Google Scholar
Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: KDD 2002, pp. 429–435. ACM (2002)
Google Scholar
Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54–77 (2017)
Google Scholar
Garofalakis, M.N., Rastogi, R., Shim, K., SPIRIT: sequential pattern mining with regular expression constraints. VLDB J. 223–234 (1999)
Google Scholar
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.: FreeSpan: frequent pattern-projected sequential pattern mining. In: KDD 2000, pp. 355–359. ACM (2000)
Google Scholar
IBM Almaden Quest Research Group, Quest Synthetic Data Generator
Google Scholar
Lin, M.Y., Lee, S.Y.: Fast discovery of sequential patterns by memory indexing. In: DaWaK 2002. LNCS, vol. 2454, pp. 150–160. Springer (2002)
Chapter Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: mining sequential patterns by prefix-projected growth. In: ICDE 2001, pp. 215–224. IEEE Computer Society (2001)
Google Scholar
Pei, J., Han, J., Wang, W.: Mining sequential patterns with constraints in large databases. In: CIKM 2002, pp. 18–25. ACM (2002)
Google Scholar
Protaziuk, G., Kryszkiewicz, M., Rybinski, H., Delteil, A.: Discovering compound and proper nouns. In: RSEISP 2007. LNCS, vol. 4585, pp. 505–515. Springer (2007)
Google Scholar
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer (1996)
Google Scholar
Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: ICDE 2004, pp. 79–90. IEEE Computer Society (2004)
Google Scholar
Yan, X., Han, J., Afshar, R.: CloSpan: mining closed sequential patterns in large datasets. In: SDM 2003, pp. 166–177. SIAM (2003)
Chapter Google Scholar
Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1/2), 31–60 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665, Warsaw, Poland
Marzena Kryszkiewicz & Łukasz Skonieczny

Authors

Marzena Kryszkiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Łukasz Skonieczny
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marzena Kryszkiewicz .

Editor information

Editors and Affiliations

Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Robert Bembenik
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Łukasz Skonieczny
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Grzegorz Protaziuk
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Marzena Kryszkiewicz
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Henryk Rybinski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kryszkiewicz, M., Skonieczny, Ł. (2019). Fast Discovery of Generalized Sequential Patterns. In: Bembenik, R., Skonieczny, Ł., Protaziuk, G., Kryszkiewicz, M., Rybinski, H. (eds) Intelligent Methods and Big Data in Industrial Applications. Studies in Big Data, vol 40. Springer, Cham. https://doi.org/10.1007/978-3-319-77604-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-77604-0_12
Published: 19 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77603-3
Online ISBN: 978-3-319-77604-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics