MapReduce

Wu, Sai

doi:10.1007/978-1-4614-8265-9_80802

MapReduce

Sai Wu³

Reference work entry
First Online: 01 January 2018

48 Accesses

Scientific Fundamentals

MapReduce refers to both a programming model and the corresponding distributed framework. Its model is composed of two phases, map and reduce, which manipulate data formated as key-value pairs. Map phase splits and sorts data on keys, whereas reduce phase applies user-defined function to process data with the same key. In this way, MapReduce is a typical divide-and-conquer framework that is designed to handle embarrassingly parallel problems, namely problems that can be split into sub-tasks with little or no synchronization costs.

Definition

MapReduce is a programming framework that allows users to process large-scaled data by leveraging the parallelism among a cluster of nodes. It is also used to refer to the distributed engine which splits and disseminates users’ jobs and monitors their processing in the cluster. MapReduce is a typical divide-and-conquer framework, since it transforms the user code into an embarrassingly parallel job, where little or no effort...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th USENIX Symposium on Operating System Design and Implementation; 2004. p. 137–50.
Google Scholar
https://hadoop.apache.org/
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. The Google file system. In: Proceedings of the 19th ACM Symposium on Operating System Principles; 2003. p. 29–43.
Google Scholar
Dittrich J, Quiané-Ruiz J-A, Jindal A, Kargin Y, Setty V, Schad J. Hadoop++: making a yellow elephant run like a cheetah (without It even noticing). Proc VLDB Endow. 2010;3(1):518–29.
Google Scholar
http://hbase.apache.org
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive – a warehousing solution over a map-reduce framework. Proc VLDB Endow. 2009;2(2):1626–9.
Article Google Scholar
http://mahout.apache.org
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1099–110.
Google Scholar
https://developer.yahoo.com/blogs/hadoop/
Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M. A comparison of approaches to large-scale data analysis. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2009. p. 165–78.
Google Scholar
Jiang D, Ooi BC, Shi L, Wu S. The performance of MapReduce: an in-depth study. Proc VLDB Endow. 2010;3(1):472–83.
Article Google Scholar
Sai Wu, Feng Li, Sharad Mehrotra, Beng Chin Ooi. Query optimization for massively parallel data processing. In: Proceedings of the 2nd ACM Symposium on Cloud Computing; 2011. p. 12.
Google Scholar
Afrati FN, Das Sarma A, Menestrina D, Parameswaran AG, Ullman JD. Fuzzy joins using MapReduce. In: Proceedings of the 28th International Conference on Data Engineering; 2012. p. 498–509.
Google Scholar
Nykiel T, Potamias M, Mishra C, Kollios G, Koudas N. MRShare: sharing across multiple queries in MapReduce. Proc VLDB Endow. 2010;3(1):494–505.
Article MATH Google Scholar
Li F, Ooi BC, Tamer Özsu M, Wu S. Distributed data management using MapReduce. ACM Comput Surv. 2014;46(3):31:1–31:42.
Google Scholar
Abouzeid A, Bajda-Pawlikowski K, Abadi DJ, Rasin A, Silberschatz A. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc VLDB Endow. 2009;2(1):922–33.
Article Google Scholar
http://www.informationweek.com/cloud/software-as-a-service/google-i-o-hello-dataflow-goodbye-mapreduce/d/d-id/1278917
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM. Distributed GraphLab: a framework for machine learning in the cloud. Proc VLDB Endow. 2012;5(8):716–27.
Article Google Scholar
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2010. p. 135–46.
Google Scholar
Jiang D, Chen G, Ooi BC, Tan K-L, Wu S. epiC: an extensible and scalable system for processing big data. Proc VLDB Endow. 2014;7(7):541–52.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Zhejiang University, Hangzhou, Zhejiang, People’s Republic of China
Sai Wu

Authors

Sai Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sai Wu .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
Ling Liu
Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
M. Tamer Özsu

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Wu, S. (2018). MapReduce. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80802

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_80802
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics