Comparing High Level MapReduce Query Languages

Stewart, Robert J.; Trinder, Phil W.; Loidl, Hans-Wolfgang

doi:10.1007/978-3-642-24151-2_5

Robert J. Stewart¹⁹,
Phil W. Trinder¹⁹ &
Hans-Wolfgang Loidl¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6965))

Included in the following conference series:

International Workshop on Advanced Parallel Processing Technologies

862 Accesses
26 Citations

Abstract

The MapReduce parallel computational model is of increasing importance. A number of High Level Query Languages (HLQLs) have been constructed on top of the Hadoop MapReduce realization, primarily Pig, Hive, and JAQL. This paper makes a systematic performance comparison of these three HLQLs, focusing on scale up, scale out and runtime metrics. We further make a language comparison of the HLQLs focusing on conciseness and computational power. The HLQL development communities are engaged in the study, which revealed technical bottlenecks and limitations described in this document, and it is impacting their development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atkinson, M.P., Buneman, P.: Types and persistence in database programming languages. ACM Comput. Surv. 19(2), 105–190 (1987)
Article Google Scholar
Beyer, K.S., Ercegovac, V., Krishnamurthy, R., Raghavan, S., Rao, J., Reiss, F., Shekita, E.J., Simmen, D.E., Tata, S., Vaithyanathan, S., Zhu, H.: Towards a scalable enterprise content analytics platform. IEEE Data Eng. Bull. 32(1), 28–35 (2009)
Google Scholar
Borthakur, D.: The Hadoop Distributed File System: Architecture and Design (2007), http://www.hadoop.apache.org
Borthakur, D.: The Hadoop Distributed File System: Architecture and Design. The Apache Software Foundation (2007)
Google Scholar
code.google.com/p/jaql. Jaql developers message board, http://groups.google.com/group/jaql-users/topics
Crockford, D.: The application/json media type for javascript object notation (json). RFC 4627 (Informational) (July 2006)
Google Scholar
Date, C.J.: An Introduction to Database Systems. Addison-Wesley Longman Publishing Co., Inc., Boston (1991)
MATH Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
The Apache Software Foundation. Hadoop — published java implementation of the join benchmark, http://goo.gl/R4ZRd
The Apache Software Foundation. Hadoop — wordcount example, http://wiki.apache.org/hadoop/WordCount
The Apache Software Foundation. Hive — language manual for the join function, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
The Apache Software Foundation. Pig 0.8 — release notes (December 2010), http://goo.gl/ySUln
The Apache Software Foundation. Hive 0.7 — release notes (March 2011), http://goo.gl/3Sj67
Gates, A.: Pig and hive at yahoo (August 2010), http://goo.gl/OVyM1
Gates, A.F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S.M., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a high-level dataflow system on top of map-reduce: the pig experience. In: Proc. VLDB Endow., vol. 2, pp. 1414–1425 (August 2009)
Google Scholar
IBM. Jaql — language manual for the join function, http://code.google.com/p/jaql/wiki/LanguageCore#Join
Murthy, A.C.: Programming Hadoop Map-Reduce: Programming, Tuning and Debugging. In: ApacheCon US (2008)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008)
Chapter Google Scholar
Stewart, R.J.: Performance and programmability comparison of mapreduce query languages: Pig, hive, jaql & java. Master’s thesis, Heriot Watt University, Edinburgh, United Kingdom (May 2010), http://www.macs.hw.ac.uk/~rs46/publications.php
Stewart, R.J.: Slideshow presentation: Performance results of high level query languages: Pig, hive, and jaql (April 2010) http://goo.gl/XbsmI
JAQL Development Team. Email discussion on jaql join runtime performance issues. private communication (September 2010)
Google Scholar
Pig Development Team. Pig DataGenerator, http://wiki.apache.org/pig/DataGeneratorHadoop
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: Proc. VLDB Endow., vol. 2(2), pp. 1626–1629 (2009)
Google Scholar
White, T.: Hadoop — The Definitive Guide: MapReduce for the Cloud. O’Reilly, Sebastopol (2009)
Google Scholar
Yahoo. Pigmix — unit test benchmarks for pig, http://wiki.apache.org/pig/PigMix
Yang, H.-c., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD 2007: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 1029–1040. ACM, New York (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Mathematical and Computer Sciences, Heriot Watt University, UK
Robert J. Stewart, Phil W. Trinder & Hans-Wolfgang Loidl

Authors

Robert J. Stewart
View author publications
You can also search for this author in PubMed Google Scholar
Phil W. Trinder
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Wolfgang Loidl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA Saclay, Parc Club Universite, rue Jean Rostand,Batiment G, 91893, Orsay Cedex, France
Olivier Temam
Department of Computer Science and Engineering, University of Minnesota, 200 Union Street, SE, 55455, Minneapolis, MN, USA
Pen-Chung Yew
Fudan University, Software Building, 825 Zhangheng Road, 200433, Shanghai, China
Binyu Zang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stewart, R.J., Trinder, P.W., Loidl, HW. (2011). Comparing High Level MapReduce Query Languages. In: Temam, O., Yew, PC., Zang, B. (eds) Advanced Parallel Processing Technologies. APPT 2011. Lecture Notes in Computer Science, vol 6965. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24151-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-24151-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24150-5
Online ISBN: 978-3-642-24151-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics