Skip to main content

Distributed Sequential Pattern Mining in Large Scale Uncertain Databases

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9652))

Abstract

While sequential pattern mining (SPM) is an import application in uncertain databases, it is challenging in efficiency and scalability. In this paper, we develop a dynamic programming (DP) approach to mine probabilistic frequent sequential patterns in distributed computing platform Spark. Directly applying the DP method to Spark is impractical because its memory-consuming characteristic may cause heavy JVM garbage collection overhead in Spark. Therefore, we design a memory-efficient distributed DP approach and use an extended prefix-tree to save intermediate results efficiently. The extensive experimental results in various scales prove that our method is orders of magnitude faster than straight-forward approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38 (2011)

    Google Scholar 

  2. Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)

    Article  Google Scholar 

  3. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)

    Google Scholar 

  4. Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In: SIGKDD, pp. 119–128. ACM (2009)

    Google Scholar 

  5. Chen, C.C., Tseng, C.Y., Chen, M.S.: Highly scalable sequential pattern mining based on mapreduce model on the cloud. In: BigData Congress, pp. 310–317 (2013)

    Google Scholar 

  6. Gao, Y., Sun, Z., Wang, Y., Liu, X., Yan, J., Zeng, J.: A comparative study on parallel LDA algorithms in mapreduce framework. In: Cao, T., Lim, E.P., Zhou, Z.H., Ho, T.B., Cheung, David, Motoda, Hiroshi (eds.) PAKDD 2015. LNCS, vol. 9078, pp. 675–689. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  7. Jestes, J., Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data. IEEE Trans. Knowl. Data Eng. 23(12), 1903–1917 (2011)

    Article  Google Scholar 

  8. Li, Y., Bailey, J., Kulik, L., Pei, J.: Mining probabilistic frequent spatio-temporal sequential patterns with gap constraints from uncertain databases. In: IEEE International Conference on Data Mining, pp. 448–457 (2013)

    Google Scholar 

  9. Miliaraki, I., Berberich, K., Gemulla, R., Zoupanos, S.: Mind the gap: large-scale frequent sequence mining. In: SIGKDD, pp. 797–808 (2013)

    Google Scholar 

  10. Muzammal, M., Raman, R.: Mining sequential patterns from probabilistic databases. In: PAKDD, pp. 210–221 (2011)

    Google Scholar 

  11. Wan, L., Chen, L., Zhang, C.: Mining frequent serial episodes over uncertain sequence data. In: EDBT, pp. 215–226 (2013)

    Google Scholar 

  12. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI 2012 (2012)

    Google Scholar 

  13. Zhao, Z., Yan, D., Ng, W.: Mining probabilistically frequent sequential patterns in uncertain databases. In: EDBT, pp. 74–85 (2012)

    Google Scholar 

  14. Zhao, Z., Yan, D., Ng, W.: Mining probabilistically frequent sequential patterns in large uncertain databases. IEEE Trans. Knowl. Data Eng. 26, 1171–1184 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaqi Ge .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ge, J., Xia, Y. (2016). Distributed Sequential Pattern Mining in Large Scale Uncertain Databases. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31750-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31749-6

  • Online ISBN: 978-3-319-31750-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics