Compilation of Query Languages into MapReduce

Sauer, Caetano; Härder, Theo

doi:10.1007/s13222-012-0112-8

Compilation of Query Languages into MapReduce

Schwerpunktbeitrag
Published: 24 January 2013

Volume 13, pages 5–15, (2013)
Cite this article

Datenbank-Spektrum Aims and scope Submit manuscript

Caetano Sauer¹ &
Theo Härder¹

369 Accesses
9 Citations
Explore all metrics

Abstract

The introduction of MapReduce as a tool for Big Data Analytics, combined with the new requirements of emerging application scenarios such as the Web 2.0 and scientific computing, has motivated the development of data processing languages which are more flexible and widely applicable than SQL. Based on the Big Data context, we discuss the points in which SQL is considered too restrictive. Furthermore, we provide a qualitative evaluation of how recent query languages overcome these restrictions. Having established the desired characteristics of a query language, we provide an abstract description of the compilation into the MapReduce programming model, which, up to minor variations, is essentially the same in all approaches. Given the requirements of query processing, we introduce simple generalizations of the model, which allow the reuse of well-established query evaluation techniques, and discuss strategies to generate optimized MapReduce plans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Although cycles are necessary to model more “exotic” operations such as recursive queries.
Newer revisions of SQL actually include the types MULTISET and TUPLE, but they are not used transparently as the type of stored tables and intermediate results.
Note that the MapReduce authors ambiguously refer to map as the first-order function that, in the functional programming setting, is actually a parameter to the higher-order function map.
Note that if we consider the implementation of Shuffle as a two-phase step—first sorting each map task output locally and then globally merging all partitions—, the first phase can start before all map tasks are completed. Nevertheless, because the merge phase still needs to wait, the whole process is itself synchronous.

References

Afrati FN, Ullman JD (2011) Optimizing multiway joins in a map-reduce environment. IEEE Trans Knowl Data Eng 23(9):1282–1298
Article Google Scholar
Bächle S (2012) Separating key concerns in query processing—set orientation, physical data independence, and parallelism. PhD thesis, University of Kaiserslautern, Germany
Battré D, Ewen S, Hueske F, Kao O, Markl V, Warneke D (2010) Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: SoCC, pp 119–130
Chapter Google Scholar
Beyer KS, Ercegovac V, Gemulla R, Balmin A, Eltabakh MY, Kanne CC, Özcan F, Shekita EJ (2011) Jaql: a scripting language for large-scale semistructured data analysis. Proc VLDB Endow 4(12):1272–1283
Google Scholar
Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77
Article Google Scholar
Dittrich J, Quiané-Ruiz JA, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc VLDB Endow 3(1):518–529
Google Scholar
Gates A, Natkovich O, Chopra S, Kamath P, Narayanam S, Olston C, Reed B, Srinivasan S, Srivastava U (2009) Building a high-level dataflow system on top of MapReduce: the pig experience. Proc VLDB Endow 2(2):1414–1425
Google Scholar
Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv 25(2):73–170
Article Google Scholar
Herodotou H, Babu S (2011) Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc VLDB Endow 4(11):1111–1122
Google Scholar
Hueske F, Peters M, Sax M, Rheinländer A, Bergmann R, Krettek A, Tzoumas K (2012) Opening the black boxes in data flow optimization. Proc VLDB Endow 5(11):1256–1267
Google Scholar
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: EuroSys, pp 59–72
Chapter Google Scholar
Jahani E, Cafarella MJ, Ré C (2011) Automatic optimization for MapReduce programs. Proc VLDB Endow 4(6):385–396
Google Scholar
Lämmel R (2008) Google’s MapReduce programming mModel—revisited. Sci Comput Program 70(1):1–30
Article MATH Google Scholar
Okcan A, Riedewald M (2011) Processing theta-joins using MapReduce. In: SIGMOD conference, pp 949–960
Google Scholar
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig Latin: a not-so-foreign language for data processing. In: SIGMOD conference, pp 1099–1110
Chapter Google Scholar
Pike R, Dorward S, Griesemer R, Quinlan S (2005) Interpreting the data: parallel analysis with Sawzall. Sci Program 13(4):277–298
Google Scholar
Sauer C, Bächle S, Härder T (2012) Versatile query processing in the MapReduce framework based on XQuery (submitted)
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Anthony S, Liu H, Murthy R (2010) Hive—a petabyte scale data warehouse using Hadoop. In: ICDE conference, pp 996–1005
Google Scholar
W3C (2011) XQuery 3.0: an XML query language. http://www.w3.org/TR/xquery-30/
W3C (2011) XQuery and XPath data model 3.0. http://www.w3.org/TR/xmlschema-11-1/
White T (2011) Hadoop—the definitive guide: storage and analysis at Internet scale, 2nd edn. O’Reilly, Sebastopol
Google Scholar
Zhang X, Chen L, Wang M (2012) Efficient multi-way theta-join processing using MapReduce. Proc VLDB Endow 5(11):1184–1195
Google Scholar

Download references

Author information

Authors and Affiliations

University of Kaiserslautern, Kaiserslautern, Germany
Caetano Sauer & Theo Härder

Authors

Caetano Sauer
View author publications
You can also search for this author in PubMed Google Scholar
Theo Härder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Theo Härder.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sauer, C., Härder, T. Compilation of Query Languages into MapReduce. Datenbank Spektrum 13, 5–15 (2013). https://doi.org/10.1007/s13222-012-0112-8

Download citation

Received: 26 November 2012
Accepted: 20 December 2012
Published: 24 January 2013
Issue Date: March 2013
DOI: https://doi.org/10.1007/s13222-012-0112-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compilation of Query Languages into MapReduce

Abstract

Access this article

Similar content being viewed by others

Representing MapReduce Optimisations in the Nested Relational Calculus

Evaluation of high-level query languages based on MapReduce in Big Data

Versatile XQuery Processing in MapReduce

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Compilation of Query Languages into MapReduce

Abstract

Access this article

Similar content being viewed by others

Representing MapReduce Optimisations in the Nested Relational Calculus

Evaluation of high-level query languages based on MapReduce in Big Data

Versatile XQuery Processing in MapReduce

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation