Performance Comparison Between Apache Hive and Oracle SQL for Big Data Analytics

Sethy, Rotsnarani; Dash, Santosh Kumar; Panda, Mrutyunjaya

doi:10.1007/978-3-319-60618-7_14

Rotsnarani Sethy¹⁸,
Santosh Kumar Dash¹⁸ &
Mrutyunjaya Panda¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 614))

Included in the following conference series:

International Conference on Soft Computing and Pattern Recognition

1415 Accesses

Abstract

Big data shall mean the massive volume of data that could not be stored, processed and managed by any traditional database management systems. Big Data Analytics becoming a comprehensive research area today this has attracted to all academia and industry to extract knowledge and information from a large amount of data. Oracle SQL is a prominent DBMS and is used worldwide. As the data goes bigger the running time is increasing in Oracle SQL. With the help of Apache Hive, we can do a large scale of data analysis in minimal time period. Apache Hive expedites for reading, writing and managing big datasets in distributed environment using SQL. Whereas Oracle SQL provides integrated development domain for running queries and scripts. In this paper, we have taken few queries for analysis for some smaller data sets as well as larger data sets and we have done an analysis for both Apache Hive and Oracle SQL environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evaluating New Approaches of Big Data Analytics Frameworks

Data Processing Framework Using Apache and Spark Technologies in Big Data

A Solution to Query Processing Challenges Through Smart Query Processor for Big Data Analytics

Article 13 January 2023

References

Chawda, R.K.: Big data and advanced analytics tools. In: Symposium on Colossal Data Analysis and Networking (CDAN) (2016)
Google Scholar
Garg, V.: Optimization of multiple queries for big data with apache Hadoop/Hive. In: 2015 International Conference on Computational Intelligence and Communication Networks, pp. 938–941 (2015)
Google Scholar
Gruenheid, A., Omiecinski, E., Mark, L.: Query optimization using column statistics in hive. In: Categories and Subject Descriptors (2016)
Google Scholar
Haryono, G.P., Zhou, Y.: Profiling apache HIVE query from runtime logs. In: International Conference on Big Data Smart Computing BigComp, pp. 61–68 (2016)
Google Scholar
Kaisler, S., Armour, F., Espinosa, J.A., Money, W.: Big data: issues and challenges moving forward. In: 2013 46th Hawaii International Conference on System Science, pp. 995–1004 (2013)
Google Scholar
Sethy, R., Panda, M.: Big data analysis using hadoop: a survey. IJARCSSE 1153–1157 (2015)
Google Scholar
Thusoo, A., Sen, S.J., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive - A petabyte scale data warehouse using Hadoop. In: Proceedings of the International Conference on Data Engineering, pp. 996–1005 (2010)
Google Scholar
Loshin, D.: Big Data Tools and Techniques, pp. 61–72 (2013). Chapter 7
Google Scholar
Hive Architecture. https://cwiki.apache.org/confluence/display/Hive/Design
Introduction to Oracle Database. https://docs.oracle.com/database/121/CNCPT/intro.htm#CNCPT001
Online Video Characteristics and Transcoding Time Dataset Data Set (2015). https://archive.ics.uci.edu/ml/datasets.html
Record Linkage Comparison Patterns Data Set (2011). https://archive.ics.uci.edu/ml/datasets.html
3D Road Network (North Jutland, Denmark) Data Set (2013). https://archive.ics.uci.edu/ml/datasets.html
Rate Data Set (2015). https://www.kaggle.com/hhsgov/health-insurance-marketplace

Download references

Author information

Authors and Affiliations

Department of Computer Science, Utkal University, Bhubaneswar, India
Rotsnarani Sethy, Santosh Kumar Dash & Mrutyunjaya Panda

Authors

Rotsnarani Sethy
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Kumar Dash
View author publications
You can also search for this author in PubMed Google Scholar
Mrutyunjaya Panda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rotsnarani Sethy .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research, Machine Intelligence Research Labs (MIR Labs), Auburn, Washington, USA
Ajith Abraham
VIT University, Vellore, Tamil Nadu, India
Aswani Kumar Cherukuri
School of Engineering, Polytechnic of Porto (ISEP/IPP), Porto, Portugal
Ana Maria Madureira
Universiti Teknikal Malaysia Melaka, Durian Tunggal, Malaysia
Azah Kamilah Muda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sethy, R., Dash, S.K., Panda, M. (2018). Performance Comparison Between Apache Hive and Oracle SQL for Big Data Analytics. In: Abraham, A., Cherukuri, A., Madureira, A., Muda, A. (eds) Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016). SoCPaR 2016. Advances in Intelligent Systems and Computing, vol 614. Springer, Cham. https://doi.org/10.1007/978-3-319-60618-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-60618-7_14
Published: 19 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60617-0
Online ISBN: 978-3-319-60618-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics