Skip to main content

Fast Discovery of Generalized Sequential Patterns

  • Chapter
  • First Online:
Intelligent Methods and Big Data in Industrial Applications

Part of the book series: Studies in Big Data ((SBD,volume 40))

  • 888 Accesses

Abstract

Knowledge in the form of generalized sequential patterns finds many applications. In this paper, we focus on optimizing GSP, which is a well-known algorithm for discovering such patterns. Our optimization consists in more selective identification of nodes to be visited while traversing a hash tree with candidates for generalized sequential patterns. It is based on the fact that elements of candidate sequences are stored as ordered sets of items. In order to reduce the number of visited nodes in the hash tree, we also propose to use not only parameters windowSize and maxGap as in original GSP, but also parameter minGap. As a result of our optimization, the number of candidates that require final time-consuming verification may be considerably decreased. In the experiments we have carried out, our optimized variant of GSP was several times faster than standard GSP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In our implementation of the hash tree, this restriction is kept for leaves at levels not exceeding |Flattened(S)|, but leaves at level |Flattened(S)| + 1 are allowed to store more than m candidate sequences.

References

  1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE 1995, pp. 3–14. IEEE Computer Society (1995)

    Google Scholar 

  2. Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: KDD 2002, pp. 429–435. ACM (2002)

    Google Scholar 

  3. Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54–77 (2017)

    Google Scholar 

  4. Garofalakis, M.N., Rastogi, R., Shim, K., SPIRIT: sequential pattern mining with regular expression constraints. VLDB J. 223–234 (1999)

    Google Scholar 

  5. Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.: FreeSpan: frequent pattern-projected sequential pattern mining. In: KDD 2000, pp. 355–359. ACM (2000)

    Google Scholar 

  6. IBM Almaden Quest Research Group, Quest Synthetic Data Generator

    Google Scholar 

  7. Lin, M.Y., Lee, S.Y.: Fast discovery of sequential patterns by memory indexing. In: DaWaK 2002. LNCS, vol. 2454, pp. 150–160. Springer (2002)

    Chapter  Google Scholar 

  8. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: mining sequential patterns by prefix-projected growth. In: ICDE 2001, pp. 215–224. IEEE Computer Society (2001)

    Google Scholar 

  9. Pei, J., Han, J., Wang, W.: Mining sequential patterns with constraints in large databases. In: CIKM 2002, pp. 18–25. ACM (2002)

    Google Scholar 

  10. Protaziuk, G., Kryszkiewicz, M., Rybinski, H., Delteil, A.: Discovering compound and proper nouns. In: RSEISP 2007. LNCS, vol. 4585, pp. 505–515. Springer (2007)

    Google Scholar 

  11. Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer (1996)

    Google Scholar 

  12. Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: ICDE 2004, pp. 79–90. IEEE Computer Society (2004)

    Google Scholar 

  13. Yan, X., Han, J., Afshar, R.: CloSpan: mining closed sequential patterns in large datasets. In: SDM 2003, pp. 166–177. SIAM (2003)

    Chapter  Google Scholar 

  14. Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1/2), 31–60 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marzena Kryszkiewicz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kryszkiewicz, M., Skonieczny, Ɓ. (2019). Fast Discovery of Generalized Sequential Patterns. In: Bembenik, R., Skonieczny, Ɓ., Protaziuk, G., Kryszkiewicz, M., Rybinski, H. (eds) Intelligent Methods and Big Data in Industrial Applications. Studies in Big Data, vol 40. Springer, Cham. https://doi.org/10.1007/978-3-319-77604-0_12

Download citation

Publish with us

Policies and ethics