BIDE-Based Parallel Mining of Frequent Closed Sequences with MapReduce

Yu, Dongjin; Wu, Wei; Zheng, Suhang; Zhu, Zhixiang

doi:10.1007/978-3-642-33065-0_19

Dongjin Yu²²,
Wei Wu²³,
Suhang Zheng²² &
…
Zhixiang Zhu²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7440))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1576 Accesses
9 Citations

Abstract

Parallel processing is essential to mining frequent closed sequences from massive volume of data in a timely manner. On the other hand, MapReduce is an ideal software framework to support distributed computing on large data sets on clusters of computers. In this paper, we develop a parallel implementation of BIDE algorithm on MapReduce, called BIDE-MR. It iteratively assigns the tasks of closure checking and pruning to different nodes in cluster. After one round of map-combine-partition-reduce, the closed frequent sequences with round-specific length and the candidates for the next round of computation are generated. Since the candidates and their pseudo project databases are independent with each other, BIDE-MR achieves high speed-ups. We implement BIDE-MR on an Apache Hadoop cluster and use BIDE-MR to mine the vehicles which frequently appear together from massive records collected at different monitoring sites. The results show that BIDE-MR attains good parallelization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wang, J., Han, J., Li, C.: Frequent Closed Sequence Mining without Candidate Maintenance. IEEE Transactions on Knowledge and Data Engineering 19(8), 1042–1056 (2007)
Article MathSciNet Google Scholar
Wang, J., Han, J.: BIDE: Efficient mining of frequent closed sequences. In: 20th International Conference on Data Engineering, pp. 79–90. IEEE Computer Society (2004)
Google Scholar
Yan, X., Han, J., Afshar, R.: CloSpan: Mining Closed Sequential Patterns in Large Databases. In: SDM 2003, San Francisco, CA, pp. 166–177 (2003)
Google Scholar
Lee Anthony, J.T., Wu, H.-W., Lee, T.-Y., Liu, Y.-H., Chen, K.-T.: Mining closed patterns in multi-sequence time-series databases. Data and Knowledge Engineering 68(10), 1071–1090 (2009)
Article Google Scholar
Dean, J., et al.: MapReduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Lucchese, C., Orlando, S., Perego, R.: Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures. In: 7th IEEE International Conference on Data Mining, pp. 242–251 (2007)
Google Scholar
Benjamin, N., Alexandre, T., Jean-Francois, M., Takeaki, U.: Discovering Closed Frequent Itemsets on Multicore: Parallelizing Computations and Optimizing Memory Accesses. In: 2010 International Conference on High Performance Computing and Simulation, pp. 521–528 (2010)
Google Scholar
Shengnan, C., Jiawei, H., David, P.: Parallel Mining of Closed Sequential patterns. In: 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 562–567 (2005)
Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th IEEE International Conference on Data Engineering, pp. 3–14 (1995)
Google Scholar
Chang, L., Wang, T., Yang, D., Luan, H.: SeqStream: Mining closed sequential patterns over stream sliding windows. In: 8th IEEE International Conference on Data Mining, pp. 83–92 (2008)
Google Scholar
Lin, M.Y.: Mining closed sequential patterns with time constraints. Journal of Information Science and Engineering 24(1), 33–46 (2008)
Google Scholar
Bolin, D., David, L., Jiawei, H., Siau-Cheng, K.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: 25th IEEE International Conference on Data Engineering, pp. 1024–1035 (2009)
Google Scholar
Chang, L., Wang, T., Yang, D., Luan, H., Tang, S.: Efficient algorithms for incremental maintenance of closed sequential patterns in large databases. Data and Knowledge Engineering 68(1), 68–106 (2009)
Article Google Scholar
Li, H.-F., Ho, C.-C., Lee, S.-Y.: Incremental updates of closed frequent itemsets over continuous data streams. Expert Systems with Applications 36(2, pt. 1), 2451–2458 (2009)
Article Google Scholar
Nikolaj, T., Boris, C.: Mining closed episodes with simultaneous events. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1172–1180 (2011)
Google Scholar
Zhu, H., Wang, P., He, X., Li, Y., Wang, W., Shi, B.: Efficient episode mining with minimal and non-overlapping occurrences. In: 10th IEEE International Conference on Data Mining, pp. 1211–1216 (2010)
Google Scholar
Zaki, M.J.: Parallel sequence mining on shared-memory machines. Journal of Parallel and Distributed Computing 61(3), 401–426 (2001)
Article MATH Google Scholar
Rozenberg, B., Gudes, E.: Association rules mining in vertically partitioned databases. Data and Knowledge Engineering 59(1), 378–396 (2006)
Article Google Scholar
Kapoor, V., Poncelet, P., Trouss, F., et al.: Privacy preserving sequential pattern mining in distributed database. In: 15th ACM Conference on Information and Knowledge Management, CIKM 2006, pp. 758–767 (2006)
Google Scholar
Nguyen, S.N., Orlowska, M.E.: A partition-based approach for sequential patterns in large sequence databases. Knowledge-Based Systems 21(2), 110–122 (2007)
Google Scholar
Guralnik, V., Karypis, G.: Parallel tree-projection-based sequence mining algorithms. Parallel Computing 30(4), 443–472 (2004)
Article Google Scholar
Luo, C., Chung Soon, M.: Parallel mining of maximal sequential patterns using multiple samples. Journal of Supercomputing 59(2), 852–881 (2012)
Article Google Scholar
The Apache Software Foundation, http://hadoop.apache.org

Download references

Author information

Authors and Affiliations

School of Computer, Hangzhou Dianzi University, Hangzhou, China
Dongjin Yu, Suhang Zheng & Zhixiang Zhu
Zhejiang Provincial Key Laboratory of Network Technology and Information Security, Hangzhou, China
Wei Wu

Authors

Dongjin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Suhang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhixiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology, Deakin University, Melbourne Burwood Campus, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Yang Xiang
SEECS, University of Ottawa, 8, King Edward Ave, K1N 6N5, Ottawa, ON, Canada
Ivan Stojmenovic
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, Higashi-ku, 813-8503, Fukuoka, Japan
Bernady O. Apduhan
School of Information Science and Engineering, Central South University, 410083, Changsha, Hunan Province, P.R. China
Guojun Wang
Department of Information Engineering, Hiroshima University, 1-4-1, Kagamiyama, 739-8527, Higashi-Hiroshima, Japan
Koji Nakano
School of Information Technologies, University of Sydney, Building J12, 2006, Sydney, NSW, Australia
Albert Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, D., Wu, W., Zheng, S., Zhu, Z. (2012). BIDE-Based Parallel Mining of Frequent Closed Sequences with MapReduce. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33065-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-33065-0_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33064-3
Online ISBN: 978-3-642-33065-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics