Skip to main content

Parallel Join Algorithms in MapReduce

  • Living reference work entry
  • First Online:
Encyclopedia of Big Data Technologies
  • 140 Accesses

Definition

The MapReduce framework is often used to analyze large volumes of unstructured and semi-structured data. A common analysis pattern involves combining a massive file that describes events (commonly in the form of a log) with much smaller reference datasets. This analytical operation corresponds to a parallel join. Parallel joins have been extensively studied in data management research, and many algorithms are tailored to take advantage of interesting properties of the input or the analysis in a relational database management system. However, the MapReduce framework was designed to operate on a single input and is a cumbersome framework for join processing. As a consequence, a new class of parallel join algorithms has been designed, implemented, and optimized specifically for the MapReduce framework.

Overview

Since its introduction, the MapReduce framework (Dean and Ghemawat 2004) has become extremely popular for analyzing large datasets. The success of MapReduce stems from...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Spyros Blanas .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Blanas, S. (2018). Parallel Join Algorithms in MapReduce. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_206-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_206-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics