An Efficient Two-Table Join Query Processing Based on Extended Bloom Filter in MapReduce

Wang, Junlu; Pang, Jun; Li, Xiaoyan; Han, Baishuo; Huang, Lei; Ding, Linlin

doi:10.1007/978-3-319-47121-1_21

An Efficient Two-Table Join Query Processing Based on Extended Bloom Filter in MapReduce

Junlu Wang¹⁵,
Jun Pang¹⁶,
Xiaoyan Li¹⁵,
Baishuo Han¹⁵,
Lei Huang¹⁵ &
…
Linlin Ding¹⁵

Conference paper
First Online: 15 October 2016

933 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9998))

Abstract

With the development of Cloud Computing, the Internet of things and some similar technologies, a large amount of data has been produced. MapReduce as a processing architecture for Cloud Computing has been widely used. It can achieve large-scale data processing. However, when connecting two tables on the data processing model of MapReduce, there will be a great deal of data that do not meet the conditions of the connection. These data will also be transferred from the map side to the reduce side. It will bring more time overhead and I/O cost at shuffle stage, which will result in low efficiency. Therefore, how to improve the join query processing algorithm based on the MapReduce has been an urgent problem. In this paper, we put forward two-table join query processing and optimization strategies for the above problems. The optimized method can achieve the expansion of the Bloom Filter. Meanwhile it can reduce the time of shuffle phase, and improve the efficiency of the system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Mishra, P., Erich, M.H.: Join processing in relational databases. ACM Comput. Surv. 24, 63–113 (1992)
Article Google Scholar
Ramakrishnan, R.: Database Management Systems. McGraw -Hill Inc, New York (1997)
MATH Google Scholar
Garcia-Molina, H., Widow, J., Ullman, J.D.: Database System Implementation. Prentice-Hall, Inc., Upper Saddle River (1999)
Google Scholar
Kwan, S.C., Baer, J.-L.: The I/O performance of multiway merge sort and tag sort. IEEE Trans. Comput. 34, 383–387 (1985)
Article MathSciNet Google Scholar
Fushimi, S., Kitsureqawa, M., Tanaka, H.: An overview of the system software of a parallel relational database machine GRACE. In: Proceedings of the Very Large DataBases Conference, pp. 209–219 (1986)
Google Scholar
Dewitt, D.J., Katz, R.H., Olken, F., et al.: Implementation techniques for main memory database systems. In: Proceedings of the ACM SIGMOD International Conference, pp. 1–8 (1984)
Google Scholar
Stamos, J.W., Young, H.C.: A symmetric fragment and replicate algorithm for distributed joins. IEEE Trans. Parallel Distrib. Syst. 4(12), 1345–1354 (1993)
Article Google Scholar
Zhang, C., Li, F., Jestes, J.: Efficient parallel kNN joins for large data in mapreduce
Google Scholar
Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using mapreduce
Google Scholar
Zhang, C., Li, J., Wu, L.: Optimizing Theta-Joins in a mapreduce environment. Int. J. Database Theory Appl. 6(4), 91–108 (2013)
Google Scholar
Koumarelas, I.K., Naskos, A., Gounaris, A.: Binary Theta-Joins using mapreduce: efficiency analysis and improvements
Google Scholar
Okcan, A., Riedewald, M.: Processing Theta-Joins using mapreduce
Google Scholar
White, T.: Hadoop: The Definitive Guide, 2nd edn. O’Reilly Media, Inc., California (2011). pp. 247–249
Google Scholar
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD 2010), pp. 975–986 (2010)
Google Scholar
Hui, S.: Large data set connection optimization algorithm based on Hadoop framework. Nanjing University of Posts and Telecommunications (2013)
Google Scholar
Lin, Y., Agrawal, D, Chun, C., et al.: Llama: leveraging columnar storage for scalable join. In: Proceedings of SIGMOD 2011. ACM, New York (2011)
Google Scholar
Yang, H.-C., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD 2007), pp. 1029–1040 (2007)
Google Scholar
http://www.tpc.org/tpch/

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China under Grant (Nos. 61472169, 61502215); Science Research Normal Fund of Liaoning Province Education Department (L2015193); Doctoral Scientific Research Start Foundation of Liaoning Province (201501127); the Young Research Foundation of Liaoning University under Grant No. LDQN201438.

Author information

Authors and Affiliations

School of Information, Liaoning University, Shenyang, 110036, China
Junlu Wang, Xiaoyan Li, Baishuo Han, Lei Huang & Linlin Ding
School of Information Science and Engineering, Northeastern University, Shenyang, 110819, China
Jun Pang

Authors

Junlu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Pang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Baishuo Han
View author publications
You can also search for this author in PubMed Google Scholar
Lei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Linlin Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linlin Ding .

Editor information

Editors and Affiliations

Tsinghua University , Beijing, China
Shaoxu Song
Beihang University , Beijing, China
Yongxin Tong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Pang, J., Li, X., Han, B., Huang, L., Ding, L. (2016). An Efficient Two-Table Join Query Processing Based on Extended Bloom Filter in MapReduce. In: Song, S., Tong, Y. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9998. Springer, Cham. https://doi.org/10.1007/978-3-319-47121-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-47121-1_21
Published: 15 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47120-4
Online ISBN: 978-3-319-47121-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics