Abstract
The MapReduce parallel computational model is of increasing importance. A number of High Level Query Languages (HLQLs) have been constructed on top of the Hadoop MapReduce realization, primarily Pig, Hive, and JAQL. This paper makes a systematic performance comparison of these three HLQLs, focusing on scale up, scale out and runtime metrics. We further make a language comparison of the HLQLs focusing on conciseness and computational power. The HLQL development communities are engaged in the study, which revealed technical bottlenecks and limitations described in this document, and it is impacting their development.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atkinson, M.P., Buneman, P.: Types and persistence in database programming languages. ACM Comput. Surv. 19(2), 105–190 (1987)
Beyer, K.S., Ercegovac, V., Krishnamurthy, R., Raghavan, S., Rao, J., Reiss, F., Shekita, E.J., Simmen, D.E., Tata, S., Vaithyanathan, S., Zhu, H.: Towards a scalable enterprise content analytics platform. IEEE Data Eng. Bull. 32(1), 28–35 (2009)
Borthakur, D.: The Hadoop Distributed File System: Architecture and Design (2007), http://www.hadoop.apache.org
Borthakur, D.: The Hadoop Distributed File System: Architecture and Design. The Apache Software Foundation (2007)
code.google.com/p/jaql. Jaql developers message board, http://groups.google.com/group/jaql-users/topics
Crockford, D.: The application/json media type for javascript object notation (json). RFC 4627 (Informational) (July 2006)
Date, C.J.: An Introduction to Database Systems. Addison-Wesley Longman Publishing Co., Inc., Boston (1991)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
The Apache Software Foundation. Hadoop — published java implementation of the join benchmark, http://goo.gl/R4ZRd
The Apache Software Foundation. Hadoop — wordcount example, http://wiki.apache.org/hadoop/WordCount
The Apache Software Foundation. Hive — language manual for the join function, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
The Apache Software Foundation. Pig 0.8 — release notes (December 2010), http://goo.gl/ySUln
The Apache Software Foundation. Hive 0.7 — release notes (March 2011), http://goo.gl/3Sj67
Gates, A.: Pig and hive at yahoo (August 2010), http://goo.gl/OVyM1
Gates, A.F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S.M., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a high-level dataflow system on top of map-reduce: the pig experience. In: Proc. VLDB Endow., vol. 2, pp. 1414–1425 (August 2009)
IBM. Jaql — language manual for the join function, http://code.google.com/p/jaql/wiki/LanguageCore#Join
Murthy, A.C.: Programming Hadoop Map-Reduce: Programming, Tuning and Debugging. In: ApacheCon US (2008)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008)
Stewart, R.J.: Performance and programmability comparison of mapreduce query languages: Pig, hive, jaql & java. Master’s thesis, Heriot Watt University, Edinburgh, United Kingdom (May 2010), http://www.macs.hw.ac.uk/~rs46/publications.php
Stewart, R.J.: Slideshow presentation: Performance results of high level query languages: Pig, hive, and jaql (April 2010) http://goo.gl/XbsmI
JAQLÂ Development Team. Email discussion on jaql join runtime performance issues. private communication (September 2010)
Pig Development Team. Pig DataGenerator, http://wiki.apache.org/pig/DataGeneratorHadoop
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: Proc. VLDB Endow., vol. 2(2), pp. 1626–1629 (2009)
White, T.: Hadoop — The Definitive Guide: MapReduce for the Cloud. O’Reilly, Sebastopol (2009)
Yahoo. Pigmix — unit test benchmarks for pig, http://wiki.apache.org/pig/PigMix
Yang, H.-c., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD 2007: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 1029–1040. ACM, New York (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stewart, R.J., Trinder, P.W., Loidl, HW. (2011). Comparing High Level MapReduce Query Languages. In: Temam, O., Yew, PC., Zang, B. (eds) Advanced Parallel Processing Technologies. APPT 2011. Lecture Notes in Computer Science, vol 6965. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24151-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-24151-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24150-5
Online ISBN: 978-3-642-24151-2
eBook Packages: Computer ScienceComputer Science (R0)