Abstract
MapReduce is the most popular platform used in cloud computing for large-scale data processing. Generally, data processing involves multi-way Theta-joins join operations.Although multi-way Theta-joins could be processed in MapReduce by using a sequence of MRJs (MapReduce Jobs), it would lead to high cost of I/O due to the storage of intermediate results between two sequential MRJs. Thus, we focus on the performance improvement of multi-way Theta-joins by reducing the number of MRJs. In this paper, a multi-way Theta-join is processed in only two MRJs, since it is decomposed into a non-Equi-join and a multi-way Equi-join and each join operation is processed in one MRJ. Our experiments show the good performance of our method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dean, J., Ghemawat, S.: Mapreduce:SimplifiedDataProcessingon LargeClusters. In: 6th Symposium on Opearting Systems Design & Implementation, pp. 137–150. USENIX Symposium, San Francisco (2004)
Okcan, A., Riedewald, M.: Processing Theta-joins Using MapReduce. In: 31st SIGMOD, pp. 949–960. ACM Press, Athens (2011)
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A Comparison of Join Algorithms for Log Processing in Map Reduce. In: 30th SIGMOD, pp. 975–986. ACM Press, Indianapolis (2010)
Jiang, D., Anthony, K.H., Tung, Chen, G.: Map-join-reduce: Towards Scalable and Efficient Data Analysison Large Clusters. J. IEEE Transactions on Knowledge and Data Engineering 23(9), 1299–1311 (2010)
Yang, H.C., Dasdan, A., Hsiao, R.L., Parker, D.S.: Map-reduce-merge: Simplified Relational Data Processing on Large Clusters. In: 27th SIGMOD, pp. 1029–1040. ACM Press, Beijing (2007)
Lee, T., Kim, K., Kim, H.J.: Join Processing Using Bloom Filter in Map Reduce. In: Proceedings of Applied Computation Symposium, pp. 100–105. ACM Research, NewYork (2012)
Hive, http://hive.apache.org
Zhang, X.F., Chen, L., Wang, M.: Efficient Multi-way Theta-Join Processing Using Map Reduce. PVLDB 5(11), 1184–1195 (2012)
Chen, S.Y., Chang, T.P., Chang, Z.H.: An Efficient Theta-Join Query Processing Algorithm on Map Reduce Framework. In: International Symposium on Computer, Consumer and Control, pp. 686–689. IEEE sponsored, Taichung (2012)
Hadoop, http://hadoop.apache.org
Lin, Y.T., Agrawal, D., Chen, C., Ooi, B.C., Wu, S.: Llama: Leveraging Columnar Storage for Scalable Join Processing in the Map Reduce Framework. In: 31th SIGMOD, pp. 961–972. ACM Press, Athens (2011)
Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: 13rd EDBT, Lausanne, pp. 99–110 (2010)
Han, H., Jung, H., Eom, H., Yeom, H.Y.: Scatter-Gather-Merge: An Efficient Star-Join Query Processing Algorithm for Data-Parallel Frameworks. J. Cluster Computing 14(2), 183–197 (2010)
Zhang, C., Li, J., Wu, L., Lin, M., Liu, W.: SEJ: An Even Approach to Multi-way Theta-Joins using Map Reduce. In: 2nd Proceedings of the International Conferenceon Cloud and Green Computing, pp. 73–80. XiangTan (2012)
TPC-H, http://www.tpc.org/tpch/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yan, K., Zhu, H. (2013). Two MRJs for Multi-way Theta-Join in MapReduce. In: Pathan, M., Wei, G., Fortino, G. (eds) Internet and Distributed Computing Systems. IDCS 2013. Lecture Notes in Computer Science, vol 8223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41428-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-41428-2_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41427-5
Online ISBN: 978-3-642-41428-2
eBook Packages: Computer ScienceComputer Science (R0)