Definition
Apache Hive is data warehousing software that facilitates reading, writing, and managing large data sets residing in distributed storage using SQL (Apache Hive PMC 2017).
Overview
Hive enables data warehousing in the Apache Hadoop ecosystem. It can run in traditional Hadoop clusters or in cloud environments. It can work with data sets as large as multiple petabytes. Initially Hive was used mainly for ETL and batch processing. While still supporting these use cases, it has evolved to also support data warehousing use cases such as reporting, interactive queries, and business intelligence. This evolution has been accomplished by adopting many common data warehousing techniques while adapting those techniques to the Hadoop ecosystem. It is implemented in Java.
Architecture
Hive’s architecture is shown in figure 1. Not all of the components in the diagram are required in every installation. LLAP and HiveServer2 are optional; the Metastore can be run embedded in HiveServer2 or...
This is a preview of subscription content, log in via an institution.
References
Apache Hive (2017) http://hive.apache.org/
Apache Hive SQL Conformance (2017) https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+SQL+Conformance. Accessed 11 Nov 2017
Boncz P et al (2005) MonetDB/X100: hyper-pipelining query execution. In: Proceedings of the 2005 CIDR conference, Asilomar, pp 225–237
Huai Y et al (2014) Major technical advancements in Apache Hive. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, Snowbird Utah, 22–27 June 2014
Saha B et al (2015) Apache Tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne, 31 May–4 June 2015
Shanklin C (2014) Benchmarking Apache Hive 13 for enterprise Hadoop. https://hortonworks.com/blog/benchmarking-apache-hive-13-enterprise-hadoop/. Accessed 9 Nov 2017
Vavilapalli V et al (2013) Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing, Santa Clara, 1–3 Oct 2013
Zaharia M (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing, Boston, 22–25 June 2010
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
Gates, A.F. (2018). Hive. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_250-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_250-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering