Scientific Fundamentals
MapReduce refers to both a programming model and the corresponding distributed framework. Its model is composed of two phases, map and reduce, which manipulate data formated as key-value pairs. Map phase splits and sorts data on keys, whereas reduce phase applies user-defined function to process data with the same key. In this way, MapReduce is a typical divide-and-conquer framework that is designed to handle embarrassingly parallel problems, namely problems that can be split into sub-tasks with little or no synchronization costs.
Definition
MapReduce is a programming framework that allows users to process large-scaled data by leveraging the parallelism among a cluster of nodes. It is also used to refer to the distributed engine which splits and disseminates users’ jobs and monitors their processing in the cluster. MapReduce is a typical divide-and-conquer framework, since it transforms the user code into an embarrassingly parallel job, where little or no effort...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th USENIX Symposium on Operating System Design and Implementation; 2004. p. 137–50.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. The Google file system. In: Proceedings of the 19th ACM Symposium on Operating System Principles; 2003. p. 29–43.
Dittrich J, Quiané-Ruiz J-A, Jindal A, Kargin Y, Setty V, Schad J. Hadoop++: making a yellow elephant run like a cheetah (without It even noticing). Proc VLDB Endow. 2010;3(1):518–29.
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive – a warehousing solution over a map-reduce framework. Proc VLDB Endow. 2009;2(2):1626–9.
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1099–110.
Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M. A comparison of approaches to large-scale data analysis. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2009. p. 165–78.
Jiang D, Ooi BC, Shi L, Wu S. The performance of MapReduce: an in-depth study. Proc VLDB Endow. 2010;3(1):472–83.
Sai Wu, Feng Li, Sharad Mehrotra, Beng Chin Ooi. Query optimization for massively parallel data processing. In: Proceedings of the 2nd ACM Symposium on Cloud Computing; 2011. p. 12.
Afrati FN, Das Sarma A, Menestrina D, Parameswaran AG, Ullman JD. Fuzzy joins using MapReduce. In: Proceedings of the 28th International Conference on Data Engineering; 2012. p. 498–509.
Nykiel T, Potamias M, Mishra C, Kollios G, Koudas N. MRShare: sharing across multiple queries in MapReduce. Proc VLDB Endow. 2010;3(1):494–505.
Li F, Ooi BC, Tamer Özsu M, Wu S. Distributed data management using MapReduce. ACM Comput Surv. 2014;46(3):31:1–31:42.
Abouzeid A, Bajda-Pawlikowski K, Abadi DJ, Rasin A, Silberschatz A. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc VLDB Endow. 2009;2(1):922–33.
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM. Distributed GraphLab: a framework for machine learning in the cloud. Proc VLDB Endow. 2012;5(8):716–27.
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2010. p. 135–46.
Jiang D, Chen G, Ooi BC, Tan K-L, Wu S. epiC: an extensible and scalable system for processing big data. Proc VLDB Endow. 2014;7(7):541–52.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Wu, S. (2018). MapReduce. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80802
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80802
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering