Abstract
Parallel processing is essential to mining frequent closed sequences from massive volume of data in a timely manner. On the other hand, MapReduce is an ideal software framework to support distributed computing on large data sets on clusters of computers. In this paper, we develop a parallel implementation of BIDE algorithm on MapReduce, called BIDE-MR. It iteratively assigns the tasks of closure checking and pruning to different nodes in cluster. After one round of map-combine-partition-reduce, the closed frequent sequences with round-specific length and the candidates for the next round of computation are generated. Since the candidates and their pseudo project databases are independent with each other, BIDE-MR achieves high speed-ups. We implement BIDE-MR on an Apache Hadoop cluster and use BIDE-MR to mine the vehicles which frequently appear together from massive records collected at different monitoring sites. The results show that BIDE-MR attains good parallelization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, J., Han, J., Li, C.: Frequent Closed Sequence Mining without Candidate Maintenance. IEEE Transactions on Knowledge and Data Engineering 19(8), 1042–1056 (2007)
Wang, J., Han, J.: BIDE: Efficient mining of frequent closed sequences. In: 20th International Conference on Data Engineering, pp. 79–90. IEEE Computer Society (2004)
Yan, X., Han, J., Afshar, R.: CloSpan: Mining Closed Sequential Patterns in Large Databases. In: SDM 2003, San Francisco, CA, pp. 166–177 (2003)
Lee Anthony, J.T., Wu, H.-W., Lee, T.-Y., Liu, Y.-H., Chen, K.-T.: Mining closed patterns in multi-sequence time-series databases. Data and Knowledge Engineering 68(10), 1071–1090 (2009)
Dean, J., et al.: MapReduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Lucchese, C., Orlando, S., Perego, R.: Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures. In: 7th IEEE International Conference on Data Mining, pp. 242–251 (2007)
Benjamin, N., Alexandre, T., Jean-Francois, M., Takeaki, U.: Discovering Closed Frequent Itemsets on Multicore: Parallelizing Computations and Optimizing Memory Accesses. In: 2010 International Conference on High Performance Computing and Simulation, pp. 521–528 (2010)
Shengnan, C., Jiawei, H., David, P.: Parallel Mining of Closed Sequential patterns. In: 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 562–567 (2005)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th IEEE International Conference on Data Engineering, pp. 3–14 (1995)
Chang, L., Wang, T., Yang, D., Luan, H.: SeqStream: Mining closed sequential patterns over stream sliding windows. In: 8th IEEE International Conference on Data Mining, pp. 83–92 (2008)
Lin, M.Y.: Mining closed sequential patterns with time constraints. Journal of Information Science and Engineering 24(1), 33–46 (2008)
Bolin, D., David, L., Jiawei, H., Siau-Cheng, K.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: 25th IEEE International Conference on Data Engineering, pp. 1024–1035 (2009)
Chang, L., Wang, T., Yang, D., Luan, H., Tang, S.: Efficient algorithms for incremental maintenance of closed sequential patterns in large databases. Data and Knowledge Engineering 68(1), 68–106 (2009)
Li, H.-F., Ho, C.-C., Lee, S.-Y.: Incremental updates of closed frequent itemsets over continuous data streams. Expert Systems with Applications 36(2, pt. 1), 2451–2458 (2009)
Nikolaj, T., Boris, C.: Mining closed episodes with simultaneous events. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1172–1180 (2011)
Zhu, H., Wang, P., He, X., Li, Y., Wang, W., Shi, B.: Efficient episode mining with minimal and non-overlapping occurrences. In: 10th IEEE International Conference on Data Mining, pp. 1211–1216 (2010)
Zaki, M.J.: Parallel sequence mining on shared-memory machines. Journal of Parallel and Distributed Computing 61(3), 401–426 (2001)
Rozenberg, B., Gudes, E.: Association rules mining in vertically partitioned databases. Data and Knowledge Engineering 59(1), 378–396 (2006)
Kapoor, V., Poncelet, P., Trouss, F., et al.: Privacy preserving sequential pattern mining in distributed database. In: 15th ACM Conference on Information and Knowledge Management, CIKM 2006, pp. 758–767 (2006)
Nguyen, S.N., Orlowska, M.E.: A partition-based approach for sequential patterns in large sequence databases. Knowledge-Based Systems 21(2), 110–122 (2007)
Guralnik, V., Karypis, G.: Parallel tree-projection-based sequence mining algorithms. Parallel Computing 30(4), 443–472 (2004)
Luo, C., Chung Soon, M.: Parallel mining of maximal sequential patterns using multiple samples. Journal of Supercomputing 59(2), 852–881 (2012)
The Apache Software Foundation, http://hadoop.apache.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yu, D., Wu, W., Zheng, S., Zhu, Z. (2012). BIDE-Based Parallel Mining of Frequent Closed Sequences with MapReduce. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33065-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-33065-0_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33064-3
Online ISBN: 978-3-642-33065-0
eBook Packages: Computer ScienceComputer Science (R0)