Hive

Gates, Alan F.

doi:10.1007/978-3-319-63962-8_250-1

Hive

Alan F. Gates³

Living reference work entry
First Online: 27 March 2018

351 Accesses
1 Citations

Definition

Apache Hive is data warehousing software that facilitates reading, writing, and managing large data sets residing in distributed storage using SQL (Apache Hive PMC 2017).

Overview

Hive enables data warehousing in the Apache Hadoop ecosystem. It can run in traditional Hadoop clusters or in cloud environments. It can work with data sets as large as multiple petabytes. Initially Hive was used mainly for ETL and batch processing. While still supporting these use cases, it has evolved to also support data warehousing use cases such as reporting, interactive queries, and business intelligence. This evolution has been accomplished by adopting many common data warehousing techniques while adapting those techniques to the Hadoop ecosystem. It is implemented in Java.

Architecture

Hive’s architecture is shown in figure 1. Not all of the components in the diagram are required in every installation. LLAP and HiveServer2 are optional; the Metastore can be run embedded in HiveServer2 or...

This is a preview of subscription content, log in via an institution.

References

Apache Hive (2017) http://hive.apache.org/
Apache Hive SQL Conformance (2017) https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+SQL+Conformance. Accessed 11 Nov 2017
Boncz P et al (2005) MonetDB/X100: hyper-pipelining query execution. In: Proceedings of the 2005 CIDR conference, Asilomar, pp 225–237
Google Scholar
Huai Y et al (2014) Major technical advancements in Apache Hive. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, Snowbird Utah, 22–27 June 2014
Google Scholar
Saha B et al (2015) Apache Tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne, 31 May–4 June 2015
Google Scholar
Shanklin C (2014) Benchmarking Apache Hive 13 for enterprise Hadoop. https://hortonworks.com/blog/benchmarking-apache-hive-13-enterprise-hadoop/. Accessed 9 Nov 2017
Vavilapalli V et al (2013) Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing, Santa Clara, 1–3 Oct 2013
Google Scholar
Zaharia M (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing, Boston, 22–25 June 2010
Google Scholar

Download references

Author information

Authors and Affiliations

Hortonworks, Santa Clara, CA, USA
Alan F. Gates

Authors

Alan F. Gates
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan F. Gates .

Editor information

Editors and Affiliations

School of Comp. Sci. and Engineering, University of New South Wales School of Comp. Sci. and Engineering, Eveleigh, New South Wales, Australia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

IBM Almaden Research Center, SAN JOSE, CA, USA
Yuanyuan Tian
IBM Research – Almaden, San Jose, CA, USA
Fatma Özcan

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Gates, A.F. (2018). Hive. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_250-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_250-1
Received: 26 February 2018
Accepted: 04 March 2018
Published: 27 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics