Skip to main content

Comparing High Level MapReduce Query Languages

  • Conference paper
Advanced Parallel Processing Technologies (APPT 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6965))

Included in the following conference series:

Abstract

The MapReduce parallel computational model is of increasing importance. A number of High Level Query Languages (HLQLs) have been constructed on top of the Hadoop MapReduce realization, primarily Pig, Hive, and JAQL. This paper makes a systematic performance comparison of these three HLQLs, focusing on scale up, scale out and runtime metrics. We further make a language comparison of the HLQLs focusing on conciseness and computational power. The HLQL development communities are engaged in the study, which revealed technical bottlenecks and limitations described in this document, and it is impacting their development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atkinson, M.P., Buneman, P.: Types and persistence in database programming languages. ACM Comput. Surv. 19(2), 105–190 (1987)

    Article  Google Scholar 

  2. Beyer, K.S., Ercegovac, V., Krishnamurthy, R., Raghavan, S., Rao, J., Reiss, F., Shekita, E.J., Simmen, D.E., Tata, S., Vaithyanathan, S., Zhu, H.: Towards a scalable enterprise content analytics platform. IEEE Data Eng. Bull. 32(1), 28–35 (2009)

    Google Scholar 

  3. Borthakur, D.: The Hadoop Distributed File System: Architecture and Design (2007), http://www.hadoop.apache.org

  4. Borthakur, D.: The Hadoop Distributed File System: Architecture and Design. The Apache Software Foundation (2007)

    Google Scholar 

  5. code.google.com/p/jaql. Jaql developers message board, http://groups.google.com/group/jaql-users/topics

  6. Crockford, D.: The application/json media type for javascript object notation (json). RFC 4627 (Informational) (July 2006)

    Google Scholar 

  7. Date, C.J.: An Introduction to Database Systems. Addison-Wesley Longman Publishing Co., Inc., Boston (1991)

    MATH  Google Scholar 

  8. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  9. The Apache Software Foundation. Hadoop — published java implementation of the join benchmark, http://goo.gl/R4ZRd

  10. The Apache Software Foundation. Hadoop — wordcount example, http://wiki.apache.org/hadoop/WordCount

  11. The Apache Software Foundation. Hive — language manual for the join function, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

  12. The Apache Software Foundation. Pig 0.8 — release notes (December 2010), http://goo.gl/ySUln

  13. The Apache Software Foundation. Hive 0.7 — release notes (March 2011), http://goo.gl/3Sj67

  14. Gates, A.: Pig and hive at yahoo (August 2010), http://goo.gl/OVyM1

  15. Gates, A.F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S.M., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a high-level dataflow system on top of map-reduce: the pig experience. In: Proc. VLDB Endow., vol. 2, pp. 1414–1425 (August 2009)

    Google Scholar 

  16. IBM. Jaql — language manual for the join function, http://code.google.com/p/jaql/wiki/LanguageCore#Join

  17. Murthy, A.C.: Programming Hadoop Map-Reduce: Programming, Tuning and Debugging. In: ApacheCon US (2008)

    Google Scholar 

  18. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008)

    Chapter  Google Scholar 

  19. Stewart, R.J.: Performance and programmability comparison of mapreduce query languages: Pig, hive, jaql & java. Master’s thesis, Heriot Watt University, Edinburgh, United Kingdom (May 2010), http://www.macs.hw.ac.uk/~rs46/publications.php

  20. Stewart, R.J.: Slideshow presentation: Performance results of high level query languages: Pig, hive, and jaql (April 2010) http://goo.gl/XbsmI

  21. JAQL Development Team. Email discussion on jaql join runtime performance issues. private communication (September 2010)

    Google Scholar 

  22. Pig Development Team. Pig DataGenerator, http://wiki.apache.org/pig/DataGeneratorHadoop

  23. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)

    Google Scholar 

  24. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: Proc. VLDB Endow., vol. 2(2), pp. 1626–1629 (2009)

    Google Scholar 

  25. White, T.: Hadoop — The Definitive Guide: MapReduce for the Cloud. O’Reilly, Sebastopol (2009)

    Google Scholar 

  26. Yahoo. Pigmix — unit test benchmarks for pig, http://wiki.apache.org/pig/PigMix

  27. Yang, H.-c., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD 2007: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 1029–1040. ACM, New York (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stewart, R.J., Trinder, P.W., Loidl, HW. (2011). Comparing High Level MapReduce Query Languages. In: Temam, O., Yew, PC., Zang, B. (eds) Advanced Parallel Processing Technologies. APPT 2011. Lecture Notes in Computer Science, vol 6965. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24151-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24151-2_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24150-5

  • Online ISBN: 978-3-642-24151-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics