Skip to main content

BIDE-Based Parallel Mining of Frequent Closed Sequences with MapReduce

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7440))

Abstract

Parallel processing is essential to mining frequent closed sequences from massive volume of data in a timely manner. On the other hand, MapReduce is an ideal software framework to support distributed computing on large data sets on clusters of computers. In this paper, we develop a parallel implementation of BIDE algorithm on MapReduce, called BIDE-MR. It iteratively assigns the tasks of closure checking and pruning to different nodes in cluster. After one round of map-combine-partition-reduce, the closed frequent sequences with round-specific length and the candidates for the next round of computation are generated. Since the candidates and their pseudo project databases are independent with each other, BIDE-MR achieves high speed-ups. We implement BIDE-MR on an Apache Hadoop cluster and use BIDE-MR to mine the vehicles which frequently appear together from massive records collected at different monitoring sites. The results show that BIDE-MR attains good parallelization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, J., Han, J., Li, C.: Frequent Closed Sequence Mining without Candidate Maintenance. IEEE Transactions on Knowledge and Data Engineering 19(8), 1042–1056 (2007)

    Article  MathSciNet  Google Scholar 

  2. Wang, J., Han, J.: BIDE: Efficient mining of frequent closed sequences. In: 20th International Conference on Data Engineering, pp. 79–90. IEEE Computer Society (2004)

    Google Scholar 

  3. Yan, X., Han, J., Afshar, R.: CloSpan: Mining Closed Sequential Patterns in Large Databases. In: SDM 2003, San Francisco, CA, pp. 166–177 (2003)

    Google Scholar 

  4. Lee Anthony, J.T., Wu, H.-W., Lee, T.-Y., Liu, Y.-H., Chen, K.-T.: Mining closed patterns in multi-sequence time-series databases. Data and Knowledge Engineering 68(10), 1071–1090 (2009)

    Article  Google Scholar 

  5. Dean, J., et al.: MapReduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  6. Lucchese, C., Orlando, S., Perego, R.: Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures. In: 7th IEEE International Conference on Data Mining, pp. 242–251 (2007)

    Google Scholar 

  7. Benjamin, N., Alexandre, T., Jean-Francois, M., Takeaki, U.: Discovering Closed Frequent Itemsets on Multicore: Parallelizing Computations and Optimizing Memory Accesses. In: 2010 International Conference on High Performance Computing and Simulation, pp. 521–528 (2010)

    Google Scholar 

  8. Shengnan, C., Jiawei, H., David, P.: Parallel Mining of Closed Sequential patterns. In: 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 562–567 (2005)

    Google Scholar 

  9. Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th IEEE International Conference on Data Engineering, pp. 3–14 (1995)

    Google Scholar 

  10. Chang, L., Wang, T., Yang, D., Luan, H.: SeqStream: Mining closed sequential patterns over stream sliding windows. In: 8th IEEE International Conference on Data Mining, pp. 83–92 (2008)

    Google Scholar 

  11. Lin, M.Y.: Mining closed sequential patterns with time constraints. Journal of Information Science and Engineering 24(1), 33–46 (2008)

    Google Scholar 

  12. Bolin, D., David, L., Jiawei, H., Siau-Cheng, K.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: 25th IEEE International Conference on Data Engineering, pp. 1024–1035 (2009)

    Google Scholar 

  13. Chang, L., Wang, T., Yang, D., Luan, H., Tang, S.: Efficient algorithms for incremental maintenance of closed sequential patterns in large databases. Data and Knowledge Engineering 68(1), 68–106 (2009)

    Article  Google Scholar 

  14. Li, H.-F., Ho, C.-C., Lee, S.-Y.: Incremental updates of closed frequent itemsets over continuous data streams. Expert Systems with Applications 36(2, pt. 1), 2451–2458 (2009)

    Article  Google Scholar 

  15. Nikolaj, T., Boris, C.: Mining closed episodes with simultaneous events. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1172–1180 (2011)

    Google Scholar 

  16. Zhu, H., Wang, P., He, X., Li, Y., Wang, W., Shi, B.: Efficient episode mining with minimal and non-overlapping occurrences. In: 10th IEEE International Conference on Data Mining, pp. 1211–1216 (2010)

    Google Scholar 

  17. Zaki, M.J.: Parallel sequence mining on shared-memory machines. Journal of Parallel and Distributed Computing 61(3), 401–426 (2001)

    Article  MATH  Google Scholar 

  18. Rozenberg, B., Gudes, E.: Association rules mining in vertically partitioned databases. Data and Knowledge Engineering 59(1), 378–396 (2006)

    Article  Google Scholar 

  19. Kapoor, V., Poncelet, P., Trouss, F., et al.: Privacy preserving sequential pattern mining in distributed database. In: 15th ACM Conference on Information and Knowledge Management, CIKM 2006, pp. 758–767 (2006)

    Google Scholar 

  20. Nguyen, S.N., Orlowska, M.E.: A partition-based approach for sequential patterns in large sequence databases. Knowledge-Based Systems 21(2), 110–122 (2007)

    Google Scholar 

  21. Guralnik, V., Karypis, G.: Parallel tree-projection-based sequence mining algorithms. Parallel Computing 30(4), 443–472 (2004)

    Article  Google Scholar 

  22. Luo, C., Chung Soon, M.: Parallel mining of maximal sequential patterns using multiple samples. Journal of Supercomputing 59(2), 852–881 (2012)

    Article  Google Scholar 

  23. The Apache Software Foundation, http://hadoop.apache.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yu, D., Wu, W., Zheng, S., Zhu, Z. (2012). BIDE-Based Parallel Mining of Frequent Closed Sequences with MapReduce. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33065-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33065-0_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33064-3

  • Online ISBN: 978-3-642-33065-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics