Skip to main content

Two MRJs for Multi-way Theta-Join in MapReduce

  • Conference paper
Internet and Distributed Computing Systems (IDCS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8223))

Included in the following conference series:

Abstract

MapReduce is the most popular platform used in cloud computing for large-scale data processing. Generally, data processing involves multi-way Theta-joins join operations.Although multi-way Theta-joins could be processed in MapReduce by using a sequence of MRJs (MapReduce Jobs), it would lead to high cost of I/O due to the storage of intermediate results between two sequential MRJs. Thus, we focus on the performance improvement of multi-way Theta-joins by reducing the number of MRJs. In this paper, a multi-way Theta-join is processed in only two MRJs, since it is decomposed into a non-Equi-join and a multi-way Equi-join and each join operation is processed in one MRJ. Our experiments show the good performance of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dean, J., Ghemawat, S.: Mapreduce:SimplifiedDataProcessingon LargeClusters. In: 6th Symposium on Opearting Systems Design & Implementation, pp. 137–150. USENIX Symposium, San Francisco (2004)

    Google Scholar 

  2. Okcan, A., Riedewald, M.: Processing Theta-joins Using MapReduce. In: 31st SIGMOD, pp. 949–960. ACM Press, Athens (2011)

    Google Scholar 

  3. Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A Comparison of Join Algorithms for Log Processing in Map Reduce. In: 30th SIGMOD, pp. 975–986. ACM Press, Indianapolis (2010)

    Google Scholar 

  4. Jiang, D., Anthony, K.H., Tung, Chen, G.: Map-join-reduce: Towards Scalable and Efficient Data Analysison Large Clusters. J. IEEE Transactions on Knowledge and Data Engineering 23(9), 1299–1311 (2010)

    Article  Google Scholar 

  5. Yang, H.C., Dasdan, A., Hsiao, R.L., Parker, D.S.: Map-reduce-merge: Simplified Relational Data Processing on Large Clusters. In: 27th SIGMOD, pp. 1029–1040. ACM Press, Beijing (2007)

    Google Scholar 

  6. Lee, T., Kim, K., Kim, H.J.: Join Processing Using Bloom Filter in Map Reduce. In: Proceedings of Applied Computation Symposium, pp. 100–105. ACM Research, NewYork (2012)

    Google Scholar 

  7. Hive, http://hive.apache.org

  8. Zhang, X.F., Chen, L., Wang, M.: Efficient Multi-way Theta-Join Processing Using Map Reduce. PVLDB 5(11), 1184–1195 (2012)

    MathSciNet  Google Scholar 

  9. Chen, S.Y., Chang, T.P., Chang, Z.H.: An Efficient Theta-Join Query Processing Algorithm on Map Reduce Framework. In: International Symposium on Computer, Consumer and Control, pp. 686–689. IEEE sponsored, Taichung (2012)

    Chapter  Google Scholar 

  10. Hadoop, http://hadoop.apache.org

  11. Lin, Y.T., Agrawal, D., Chen, C., Ooi, B.C., Wu, S.: Llama: Leveraging Columnar Storage for Scalable Join Processing in the Map Reduce Framework. In: 31th SIGMOD, pp. 961–972. ACM Press, Athens (2011)

    Google Scholar 

  12. Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: 13rd EDBT, Lausanne, pp. 99–110 (2010)

    Google Scholar 

  13. Han, H., Jung, H., Eom, H., Yeom, H.Y.: Scatter-Gather-Merge: An Efficient Star-Join Query Processing Algorithm for Data-Parallel Frameworks. J. Cluster Computing 14(2), 183–197 (2010)

    Article  Google Scholar 

  14. Zhang, C., Li, J., Wu, L., Lin, M., Liu, W.: SEJ: An Even Approach to Multi-way Theta-Joins using Map Reduce. In: 2nd Proceedings of the International Conferenceon Cloud and Green Computing, pp. 73–80. XiangTan (2012)

    Google Scholar 

  15. TPC-H, http://www.tpc.org/tpch/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yan, K., Zhu, H. (2013). Two MRJs for Multi-way Theta-Join in MapReduce. In: Pathan, M., Wei, G., Fortino, G. (eds) Internet and Distributed Computing Systems. IDCS 2013. Lecture Notes in Computer Science, vol 8223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41428-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41428-2_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41427-5

  • Online ISBN: 978-3-642-41428-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics