skip to main content
10.1145/3448016.3457562acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Greenplum: A Hybrid Database for Transactional and Analytical Workloads

Published: 18 June 2021 Publication History

Abstract

Demand for enterprise data warehouse solutions to support real-time Online Transaction Processing (OLTP) queries as well as long-running Online Analytical Processing (OLAP) workloads is growing. Greenplum database is traditionally known as an OLAP data warehouse system with limited ability to process OLTP workloads. In this paper, we augment Greenplum into a hybrid system to serve both OLTP and OLAP workloads. The challenge we address here is to achieve this goal while maintaining the ACID properties with minimal performance overhead. In this effort, we identify the engineering and performance bottlenecks such as the under-performing restrictive locking and the two-phase commit protocol. Next we solve the resource contention issues between transactional and analytical queries. We propose a global deadlock detector to increase the concurrency of query processing. When transactions that update data are guaranteed to reside on exactly one segment we introduce one-phase commit to speed up query processing. Our resource group model introduces the capability to separate OLAP and OLTP workloads into more suitable query processing mode. Our experimental evaluation on the TPC-B and CH-benCHmark benchmarks demonstrates the effectiveness of our approach in boosting the OLTP performance without sacrificing the OLAP performance.

Supplementary Material

MP4 File (3448016.3457562.mp4)
Traditional methods for data analytic include two disparate database systems for different goals. One database system processes transactional workloads, known as Online Transactional Processing (OLTP), characterized by low latency but highly concurrent operations. The other one is a data warehouse that processes queries for data analysis, known as Online Analytical Processing (OLAP), characterized by long running operations such as joins and aggregates, involving computation on multiple tables and causing bulk reads and writes.Greenplum database is traditionally known as an OLAP data warehouse system without sufficient OLTP ability. In this paper, we aim to address the ever-growing demand for real-time data analysis and to enhance Greenplum with the hybrid capability to serve both OLTP and OLAP workloads, while maintaining the ACID properties and delivering satisfying performance. To build an efficient parallel HTAP system based on an OLAP system, we first identified engineering and performance challenges, such as the under-performing restrictive locking and the two-phase commit protocol. We proposed and implemented solutions to address these challenges in Greenplum, including a global deadlock detector which is used to increase the concurrency of query processing, and a one-phase commit protocol which is used to reduce table locking to speed up query processing. A further improvement which is named resource group is deployed to separate OLAP and OLTP workloads into more suitable query processing mode. Those ideas boost the OLTP performance without sacrificing the OLAP performance and ACID. Performance evaluation on the TPC-B and CH-benCHmark benchmarks shows a significant improvement.

References

[1]
Ronald Barber, Christian Garcia-Arellano, Ronen Grosman, Rene Mueller, Vijayshankar Raman, Richard Sidle, Matt Spilchen, Adam J Storm, Yuanyuan Tian, Pinar Tözün, et almbox. 2017. Evolving Databases for New-Gen Big Data Applications. In CIDR .
[2]
John Catozzi and Sorana Rabinovici. 2001. Operating system extensions for the teradata parallel VLDB. In VLDB, Vol. 1. 679--682.
[3]
Richard Cole, Florian Funke, Leo Giakoumakis, Wey Guy, Alfons Kemper, Stefan Krompass, Harumi Kuno, Raghunath Nambiar, Thomas Neumann, Meikel Poess, et almbox. 2011. The mixed workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems. 1--6.
[4]
James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, et almbox. 2013. Spanner: Google's globally distributed database. ACM Transactions on Computer Systems (TOCS), Vol. 31, 3 (2013), 1--22.
[5]
Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Stefano Stefani, and Vidhya Srinivasan. 2015. Amazon Redshift and the Case for Simpler Data Warehouses. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, Timos K. Sellis, Susan B. Davidson, and Zachary G. Ives (Eds.). ACM, 1917--1923.
[6]
Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et almbox. 2020. TiDB: a Raft-based HTAP database. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 3072--3084.
[7]
Natalija Krivokapić, Alfons Kemper, and Ehud Gudes. 1999. Deadlock detection in distributed database systems: a new algorithm and a comparative performance analysis. The VLDB Journal, Vol. 8, 2 (1999), 79--100.
[8]
Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandier, Lyric Doshi, and Chuck Bear. 2012a. The Vertica Analytic Database: C-Store 7 Years Later. Proc. VLDB Endow., Vol. 5, 12 (2012), 1790--1801.
[9]
Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandier, Lyric Doshi, and Chuck Bear. 2012b. The vertica analytic database: C-store 7 years later. arXiv preprint arXiv:1208.4173 (2012).
[10]
Leslie Lamport et almbox. 2001. Paxos made simple. ACM Sigact News, Vol. 32, 4 (2001), 18--25.
[11]
Justin J Levandoski, Per-Åke Larson, and Radu Stoica. 2013. Identifying hot and cold data in main-memory databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 26--37.
[12]
Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Haozhou Wang, Gang Guo, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, Ashwin Agrawal, Alexandra Wang, Wen Lin, Junfeng Yang, Hao Wu, Xiaoliang Li, Feng Guo, Jiang Wu, Jesse Zhang, and Venkatesh Raghavan. 2021. Greenplum: A Hybrid Database for Transactional and Analytical Workloads. arxiv: 2103.11080 [cs.DB]
[13]
Darko Makreshanski, Jana Giceva, Claude Barthels, and Gustavo Alonso. 2017. BatchDB: Efficient isolated execution of hybrid OLTP
[14]
OLAP workloads for interactive applications. In Proceedings of the 2017 ACM International Conference on Management of Data. 37--50.
[15]
Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In 2014 $$USENIX$$ Annual Technical Conference ($$USENIX$$$$ATC$$ 14). 305--319.
[16]
Fatma Özcan, Yuanyuan Tian, and Pinar Tözün. 2017. Hybrid transactional/analytical processing: A survey. In Proceedings of the 2017 ACM International Conference on Management of Data . 1771--1775.
[17]
Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, et almbox. 2013. F1: A distributed SQL database that scales. (2013).
[18]
Mukesh Singhal. 1989. Deadlock detection in distributed systems. Computer, Vol. 22, 11 (1989), 37--48.
[19]
Mohamed A Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, et almbox. 2014. Orca: a modular query optimizer architecture for big data. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 337--348.
[20]
Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, et almbox. 2020. CockroachDB: The Resilient Geo-Distributed SQL Database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data . 1493--1509.
[21]
Naidu Siddartha Tigani Jordan. 2014. Google BigQuery Analytics .Wiley.
[22]
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In Proceedings of the 2017 ACM International Conference on Management of Data. 1041--1052.
[23]
Jiacheng Yang, Ian Rae, Jun Xu, Jeff Shute, Zhan Yuan, Kelvin Lau, Qiang Zeng, Xi Zhao, Jun Ma, Ziyang Chen, et almbox. 2020. F1 Lightning: HTAP as a Service. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 3313--3325.
[24]
Chaoqun Zhan, Maomeng Su, Chuangxian Wei, Xiaoqiang Peng, Liang Lin, Sheng Wang, Zhe Chen, Feifei Li, Yue Pan, Fang Zheng, and Chengliang Chai. 2019. AnalyticDB: Real-time OLAP Database System at Alibaba Cloud. Proc. VLDB Endow., Vol. 12, 12 (2019), 2059--2070.

Cited By

View all
  • (2025)Rapid Data Ingestion through DB-OS Co-designProceedings of the ACM on Management of Data10.1145/37097183:1(1-28)Online publication date: 11-Feb-2025
  • (2024)Chip-Level Defect Analysis with Virtual Bad Wafers Based on Huge Big Data Handling for Semiconductor ProductionElectronics10.3390/electronics1311220513:11(2205)Online publication date: 5-Jun-2024
  • (2024)TDSQL: Tencent Distributed Database SystemProceedings of the VLDB Endowment10.14778/3685800.368581217:12(3869-3882)Online publication date: 8-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. database
  2. hybrid transactional and analytical process

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)92
  • Downloads (Last 6 weeks)6
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Rapid Data Ingestion through DB-OS Co-designProceedings of the ACM on Management of Data10.1145/37097183:1(1-28)Online publication date: 11-Feb-2025
  • (2024)Chip-Level Defect Analysis with Virtual Bad Wafers Based on Huge Big Data Handling for Semiconductor ProductionElectronics10.3390/electronics1311220513:11(2205)Online publication date: 5-Jun-2024
  • (2024)TDSQL: Tencent Distributed Database SystemProceedings of the VLDB Endowment10.14778/3685800.368581217:12(3869-3882)Online publication date: 8-Nov-2024
  • (2024)ClickHouse - Lightning Fast Analytics for EveryoneProceedings of the VLDB Endowment10.14778/3685800.368580217:12(3731-3744)Online publication date: 8-Nov-2024
  • (2024)A Spark Optimizer for Adaptive, Fine-Grained Parameter TuningProceedings of the VLDB Endowment10.14778/3681954.368202117:11(3565-3579)Online publication date: 1-Jul-2024
  • (2024)HTAP Databases: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338969336:11(6410-6429)Online publication date: Nov-2024
  • (2024)Log Replaying for Real-Time HTAP: An Adaptive Epoch-Based Two-Stage Framework2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00167(2096-2108)Online publication date: 13-May-2024
  • (2024)A survey on hybrid transactional and analytical processingThe VLDB Journal10.1007/s00778-024-00858-933:5(1485-1515)Online publication date: 4-Jun-2024
  • (2023)Cloud-Based IoT Data Warehousing Technology for E-HealthcarePioneering Smart Healthcare 5.0 with IoT, Federated Learning, and Cloud Security10.4018/979-8-3693-2639-8.ch008(111-129)Online publication date: 1-Dec-2023
  • (2023)Cloud based IoT Electronic Healthcare Data Warehouse Integration in emerging 5G Health Grid EcosystemJournal of ISMAC10.36548/jismac.2023.1.0035:1(30-54)Online publication date: 13-Mar-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media