skip to main content
10.1145/2723372.2742792acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

QMapper for Smart Grid: Migrating SQL-based Application to Hive

Published: 27 May 2015 Publication History

Abstract

Apache Hive has been widely used by Internet companies for big data analytics applications. It can provide the capability of compiling high-level languages into efficient MapReduce workflows, which frees users from complicated and time consuming programming. The popularity of Hive and its HiveQL-compatible systems like Impala and Shark attracts attentions from traditional enterprises as well. However, enterprise big data processing systems such as Smart Grid applications often have to migrate their RDBMS-based legacy applications to Hive rather than directly writing new logic in HiveQL. Considering their differences in syntax and cost model, manual translation from SQL in RDBMS to HiveQL is very difficult, error-prone, and often leads to poor performance.
In this paper, we propose QMapper, a tool for automatically translating SQL into proper HiveQL. QMapper consists of a rule-based rewriter and a cost-based optimizer. The experiments based on the TPC-H benchmark demonstrate that, compared to manually rewritten Hive queries provided by Hive contributors, QMapper dramatically reduces the query latency on average. Our real world Smart Grid application also shows its efficiency.

References

[1]
A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. VLDB, 2(1):922--933, 2009.
[2]
S. Babu. Towards automatic optimization of mapreduce programs. In SoCC, pages 137--142, 2010.
[3]
K. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.-C. Kanne, F. Ozcan, and E. J. Shekita. Jaql: A scripting language for large scale semistructured data analysis. In VLDB, 2011.
[4]
Y. Chen, S. Alspaugh, and R. Katz. Interactive analytical processing big data systems: A cross-industry study of mapreduce workloads. VLDB, 5(12):1802--1813, 2012.
[5]
A. Floratou, U. F. Minhas, and U. F. Minhas. Sql-on-hadoop: Full circle back to shared-nothing database architectures. Proceedings of the VLDB Endowment, 12(7):1295--1306, 2014.
[6]
M. J. Franklin, B. T. Jónsson, and D. Kossmann. Performance tradeoffs for client-server query processing. ACM SIGMOD Record, 25(2):149--160, 1996.
[7]
L. M. Haas, W. Chang, G. M. Lohman, J. McPherson, P. F. Wilms, G. Lapis, B. Lindsay, H. Pirahesh, M. J. Carey, and E. Shekita. Starburst mid-flight: as the dust clears. TKDE, 2(1):143--160, 1990.
[8]
H. Herodotou. Hadoop performance models. arXiv preprint arXiv:1106.0940, 2011.
[9]
H. Herodotou and S. Babu. Profiling, what-if analysis, and cost-based optimization of mapreduce programs. VLDB, 4(11):1111--1122, 2011.
[10]
H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu. Starfish: A self-tuning system for big data analytics. In CIDR, volume 11, pages 261--272, 2011.
[11]
S. Hu, W. Liu, T. Rabl, S. Huang, Y. Liang, Z. Xiao, H.-A. Jacobson, X. Pei, and J. Wang. Dualtable: A hybrid storage model for update optimization in hive. In ICDE, 2015. to appear.
[12]
R. Lee, T. Luo, F. Huai, Yand Wang, Y. He, and X. Zhang. Ysmart: Yet another sql-to-mapreduce translator. In ICDCS, pages 25--36, 2011.
[13]
H. Lim, H. Herodotou, and S. Babu. Stubby: A transformation-based optimizer for mapreduce workflows. VLDB, 5(11):1196--1207, 2012.
[14]
L. Lin, V. Lychagina, W. Liu, Y. Kwon, S. Mittal, and M. Wong. Tenzing a sql implementation on the mapreduce framework. 2011.
[15]
Y. Liu, S. Hu, T. Rabl, W. Liu, H.-A. Jacobsen, K. Wu, J. Chen, and J. Li. DGFIndex for Smart Grid: Enhancing Hive with a Cost-Effective Multidimensional Range Index. Proceedings of the VLDB Endowment, 13(7):1496--1507, 2014.
[16]
C. Olston, U. Reed, Benjamand Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD, pages 1099--1110, 2008.
[17]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. VLDB, 2(2):1626--1629, 2009.
[18]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive-a petabyte scale data warehouse using hadoop. In ICDE, pages 996--1005, 2010.
[19]
S. Wu, F. Li, S. Mehrotra, and B. C. Ooi. Query optimization for massively parallel data processing. In SoCC, page 12, 2011.
[20]
R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: Sql and rich analytics at scale. In SIGMOD, pages 13--24, 2013.
[21]
Y. Xu and S. Hu. Qmapper: a tool for sql optimization on hive using query rewriting. In WWW, pages 211--212, 2013.

Cited By

View all
  • (2021)The implementation of data storage and analytics platform for big data lake of electricity usage with sparkThe Journal of Supercomputing10.1007/s11227-020-03505-677:6(5934-5959)Online publication date: 1-Jun-2021
  • (2019)Use of Big Data in AviationAutomated Systems in the Aviation and Aerospace Industries10.4018/978-1-5225-7709-6.ch017(436-452)Online publication date: 2019
  • (2016)Leveraging Data Analytics by Transforming Relational Database Schema in to Big DataTrends in Computer Science and Information Technology10.17352/tcsit.0000021:1(012-017)Online publication date: 30-Dec-2016
  • Show More Cited By

Index Terms

  1. QMapper for Smart Grid: Migrating SQL-based Application to Hive

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
    May 2015
    2110 pages
    ISBN:9781450327589
    DOI:10.1145/2723372
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. hive
    2. join optimization
    3. sql on hadoop
    4. system migration

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Conference

    SIGMOD/PODS'15
    Sponsor:
    SIGMOD/PODS'15: International Conference on Management of Data
    May 31 - June 4, 2015
    Victoria, Melbourne, Australia

    Acceptance Rates

    SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)The implementation of data storage and analytics platform for big data lake of electricity usage with sparkThe Journal of Supercomputing10.1007/s11227-020-03505-677:6(5934-5959)Online publication date: 1-Jun-2021
    • (2019)Use of Big Data in AviationAutomated Systems in the Aviation and Aerospace Industries10.4018/978-1-5225-7709-6.ch017(436-452)Online publication date: 2019
    • (2016)Leveraging Data Analytics by Transforming Relational Database Schema in to Big DataTrends in Computer Science and Information Technology10.17352/tcsit.0000021:1(012-017)Online publication date: 30-Dec-2016
    • (2016)PerfOratorProceedings of the Seventh ACM Symposium on Cloud Computing10.1145/2987550.2987566(415-427)Online publication date: 5-Oct-2016
    • (2016)Hug the Elephant: Migrating a Legacy Data Analytics Application to Hadoop Ecosystem2016 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME.2016.14(177-187)Online publication date: Oct-2016
    • (2016)On Construction of an Energy Monitoring Service Using Big Data Technology for Smart Campus2016 7th International Conference on Cloud Computing and Big Data (CCBD)10.1109/CCBD.2016.026(81-86)Online publication date: Nov-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media