Skip to main content

Hive

  • Living reference work entry
  • First Online:

Definition

Apache Hive is data warehousing software that facilitates reading, writing, and managing large data sets residing in distributed storage using SQL (Apache Hive PMC 2017).

Overview

Hive enables data warehousing in the Apache Hadoop ecosystem. It can run in traditional Hadoop clusters or in cloud environments. It can work with data sets as large as multiple petabytes. Initially Hive was used mainly for ETL and batch processing. While still supporting these use cases, it has evolved to also support data warehousing use cases such as reporting, interactive queries, and business intelligence. This evolution has been accomplished by adopting many common data warehousing techniques while adapting those techniques to the Hadoop ecosystem. It is implemented in Java.

Architecture

Hive’s architecture is shown in figure 1. Not all of the components in the diagram are required in every installation. LLAP and HiveServer2 are optional; the Metastore can be run embedded in HiveServer2 or...

This is a preview of subscription content, log in via an institution.

References

  • Apache Hive (2017) http://hive.apache.org/

  • Apache Hive SQL Conformance (2017) https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+SQL+Conformance. Accessed 11 Nov 2017

  • Boncz P et al (2005) MonetDB/X100: hyper-pipelining query execution. In: Proceedings of the 2005 CIDR conference, Asilomar, pp 225–237

    Google Scholar 

  • Huai Y et al (2014) Major technical advancements in Apache Hive. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, Snowbird Utah, 22–27 June 2014

    Google Scholar 

  • Saha B et al (2015) Apache Tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne, 31 May–4 June 2015

    Google Scholar 

  • Shanklin C (2014) Benchmarking Apache Hive 13 for enterprise Hadoop. https://hortonworks.com/blog/benchmarking-apache-hive-13-enterprise-hadoop/. Accessed 9 Nov 2017

  • Vavilapalli V et al (2013) Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing, Santa Clara, 1–3 Oct 2013

    Google Scholar 

  • Zaharia M (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing, Boston, 22–25 June 2010

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan F. Gates .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Gates, A.F. (2018). Hive. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_250-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_250-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics