skip to main content
10.1145/3514221.3526148acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

How Good is My HTAP System?

Published: 11 June 2022 Publication History

Abstract

Hybrid Transactional and Analytical Processing (HTAP) systems have recently gained popularity as they combine OLAP and OLTP processing to reduce administrative and synchronization costs between dedicated systems. However, there is no precise characterization of the features that distinguish a good HTAP system from a poor one. In this paper, we seek to solve this problem from the perspectives of both performance and freshness. To simultaneously capture the performance of both transactional and analytical processing, we introduce a new concept called throughput frontier, which visualizes both transactional and analytical throughput in a single 2D graph. The throughput frontier can capture information regarding the performance of each engine, the interference between the two engines, and various system design decisions. To capture how well an HTAP system supports real-time analytics, we define a freshness metric which quantifies how recent is the snapshot of the data seen by each analytical query. We also develop a practical way to measure freshness in a real system. We design a new hybrid benchmark called HATtrick which incorporates both throughput frontier and freshness as metrics. Using the benchmark, we evaluate three representative HTAP systems under various data size and system configurations and demonstrate how the metrics reveal important system characteristics and performance information.

References

[1]
Version 1.14.0. 2011. TPC BENCHMARK E.
[2]
Vesion 3.0.0. 2011. TPH BENCHMARK H.
[3]
Revision 5.11. 2009. TPC BENCHMARK C.
[4]
Raja Appuswamy, Manos Karpathiotakis, Danica Porobic, and Anastasia Ailamaki. 2017. The case for heterogeneous HTAP. In 8th Biennial Conference on Innovative Data Systems Research.
[5]
Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the archipelago between row-stores and column-stores for hybrid workloads. In Proceedings of the 2016 International Conference on Management of Data. 583--598.
[6]
Ronald Barber, Christian Garcia-Arellano, Ronen Grosman, Rene Mueller, Vijayshankar Raman, Richard Sidle, Matt Spilchen, Adam J Storm, Yuanyuan Tian, Pinar Tözün, et al. 2017. Evolving Databases for New-Gen Big Data Applications. In CIDR.
[7]
Fábio Coelho, João Paulo, Ricardo Vilaça, José Pereira, and Rui Oliveira. 2017. Htapbench: Hybrid transactional and analytical processing benchmark. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering. 293--304.
[8]
Ravishankar Ramamurthy David J DeWitt and Qi Su. 2002. A Case for Fractured Mirrors. In Proceedings 2002 VLDB Conference: 28th International Conference on Very Large Databases (VLDB). Elsevier, 430.
[9]
Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 1243--1254.
[10]
Franz Färber, Norman May, Wolfgang Lehner, Philipp GroBe, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANA Database--An Architecture Overview. IEEE Data Eng. Bull. 35, 1 (2012), 28--33.
[11]
Christian Garcia-Arellano, Hamdi Roumani, Richard Sidle, Josh Tiefenbach, Kostas Rakopoulos, Imran Sayyid, Adam Storm, Ronald Barber, Fatma Ozcan, Daniel Zilio, et al. 2020. Db2 event store: a purpose-built IoT database engine. Proceedings of the VLDB Endowment 13, 12 (2020), 3299--3312.
[12]
Jana Giceva and Mohammad Sadoghi. 2019. Hybrid OLTP and OLAP.
[13]
Anil K Goel, Jeffrey Pound, Nathan Auch, Peter Bumbulis, Scott MacLean, Franz Färber, Francis Gropengiesser, Christian Mathis, Thomas Bodner, and Wolfgang Lehner. 2015. Towards scalable real-time analytics: An architecture for scale-out of OLxP workloads. Proceedings of the VLDB Endowment 8, 12 (2015), 1716--1727.
[14]
Daniel Hieber and Gregor Grambow. [n.d.]. Hybrid Transactional and Analytical Processing Databases: A Systematic Literature Review. ([n. d.]).
[15]
Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et al. 2020. TiDB: a Raft-based HTAP database. Proceedings of the VLDB Endowment 13, 12 (2020), 3072--3084.
[16]
J.S. Karlsson, A. Lal, C. Leung, and T. Pham. 2001. IBM DB2 Everyplace: a small footprint relational database system. In Proceedings 17th International Conference on Data Engineering. 230--232. https://doi.org/10.1109/ICDE.2001.914833
[17]
Alfons Kemper and Thomas Neumann. 2011. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In 2011 IEEE 27th International Conference on Data Engineering. IEEE, 195--206.
[18]
Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, Teck-Hua Lee, et al. 2015. Oracle database in-memory: A dual format in-memory database. In 2015 IEEE 31st International Conference on Data Engineering. IEEE, 1253--1258.
[19]
Per-Åke Larson, Adrian Birka, Eric N Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos. 2015. Real-time analytical processing with SQL server. Proceedings of the VLDB Endowment 8, 12 (2015), 1740--1751.
[20]
Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Gang Guo, Haozhou Wang, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, Alexandra Wang, et al. 2021. Greenplum: A Hybrid Database for Transactional and Analytical Workloads. In Proceedings of the 2021 International Conference on Management of Data. 2530--2542.
[21]
Darko Makreshanski, Jana Giceva, Claude Barthels, and Gustavo Alonso. 2017. BatchDB: Efficient isolated execution of hybrid OLTP+ OLAP workloads for interactive applications. In Proceedings of the 2017 ACM International Conference on Management of Data. 37--50.
[22]
Barzan Mozafari, Jags Ramnarayan, Sudhir Menon, Yogesh Mahajan, Soubhik Chakraborty, Hemant Bhanawat, and Kishor Bachhav. 2017. SnappyData: A Unified Cluster for Streaming, Transactions and Interactice Analytics. In CIDR.
[23]
Thomas Neumann, Tobias Mühlbauer, and Alfons Kemper. 2015. Fast serializable multi-version concurrency control for main-memory database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 677--689.
[24]
Oracle. 2021. Oracle Database 21c. https://docs.oracle.com/en/database/oracle/oracle-database/21/index.html.
[25]
Fatma Özcan, Yuanyuan Tian, and Pinar Tözün. 2017. Hybrid transactional/analytical processing: A survey. In Proceedings of the 2017 ACM International Conference on Management of Data. 1771--1775.
[26]
O'Neil Pat, O'Neil Betty, and Chen Xuedong. 2009. The Star Schema Benchmark.
[27]
PostgreSQL. 2021. PostgreSQL Streaming Replication Documentation. https://www.postgresql.org/docs/current/warm-standby.html.
[28]
PostgreSQL. 2021. PostgreSQL: The World's Most Advanced Open Source Relational Database. https://www.postgresql.org/.
[29]
PostgreSQL. 2021. Swarm64 HTAP Benchmark for PostgreSQL. (2021).
[30]
Guna Prasaad, Alvin Cheung, and Dan Suciu. 2020. Handling highly contended OLTP workloads using fast dynamic partitioning. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 527--542.
[31]
Iraklis Psaroudakis, Florian Wolf, Norman May, Thomas Neumann, Alexander Böhm, Anastasia Ailamaki, and Kai-Uwe Sattler. 2014. Scaling up mixed workloads: a battle of data freshness, flexibility, and scheduling. In Technology Conference on Performance Evaluation and Benchmarking. Springer, 97--112.
[32]
Vijayshankar Raman, Gopi Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent Kulandai Samy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M Lohman, et al . 2013. DB2 with BLU acceleration: So much more than just a column store. Proceedings of the VLDB Endowment 6, 11 (2013), 1080--1091.
[33]
Aunn Raza, Periklis Chrysogelos, Angelos Christos Anadiotis, and Anastasia Ailamaki. 2020. Adaptive HTAP through elastic resource scheduling. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2043--2054.
[34]
Kun Ren, Jose M Faleiro, and Daniel J Abadi. 2016. Design principles for scaling multi-core oltp under high contention. In Proceedings of the 2016 International Conference on Management of Data. 1583--1598.
[35]
Margy Ross and Ralph Kimball. 2013. The data warehouse toolkit: the definitive guide to dimensional modeling. John Wiley & Sons.
[36]
Jimi Carmen Sanchez. 2016. Investigating the Star Schema Benchmark as a Replacement for the TPC-H Decision Support System. (2016).
[37]
Vishal Sikka, Franz Färber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhövd. 2012. Efficient transaction processing in SAP HANA database: the end of a column store myth. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 731--742.
[38]
Alex Skidanov, Anders J. Papito, and Adam Prout. 2016. A column store engine for real-time streaming analytics. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). 1287--1297. https://doi.org/10.1109/ICDE.2016.7498332
[39]
Michael Stonebraker. 1987. The design of the Postgres storage system. Technical Report. CALIFORNIA UNIV BERKELEY ELECTRONICS RESEARCH LAB.
[40]
Michael Stonebraker and Lawrence A Rowe. 1986. The design of Postgres. ACM Sigmod Record 15, 2 (1986), 340--355.
[41]
Yingjun Wu, Joy Arulraj, Jiexi Lin, Ran Xian, and Andrew Pavlo. 2017. An empirical evaluation of in-memory multi-version concurrency control. Proceedings of the VLDB Endowment 10, 7 (2017), 781--792.
[42]
Cong Yan and Alvin Cheung. 2016. Leveraging lock contention to improve OLTP application performance. Proceedings of the VLDB Endowment 9, 5 (2016), 444--455.
[43]
Jiacheng Yang, Ian Rae, Jun Xu, Jeff Shute, Zhan Yuan, Kelvin Lau, Qiang Zeng, Xi Zhao, Jun Ma, Ziyang Chen, et al. 2020. F1 Lightning: HTAP as a Service. Proceedings of the VLDB Endowment 13, 12 (2020), 3313--3325.

Cited By

View all
  • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 30-Aug-2024
  • (2024)Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAPProceedings of the VLDB Endowment10.14778/3681954.368200117:11(3290-3303)Online publication date: 30-Aug-2024
  • (2024)HyBench: A New Benchmark for HTAP DatabasesProceedings of the VLDB Endowment10.14778/3641204.364120617:5(939-951)Online publication date: 2-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HTAP
  2. benchmark
  3. freshness score
  4. throughput frontier

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)820
  • Downloads (Last 6 weeks)109
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 30-Aug-2024
  • (2024)Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAPProceedings of the VLDB Endowment10.14778/3681954.368200117:11(3290-3303)Online publication date: 30-Aug-2024
  • (2024)HyBench: A New Benchmark for HTAP DatabasesProceedings of the VLDB Endowment10.14778/3641204.364120617:5(939-951)Online publication date: 2-May-2024
  • (2024)HTAP Databases: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338969336:11(6410-6429)Online publication date: Nov-2024
  • (2024)A survey on hybrid transactional and analytical processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00858-933:5(1485-1515)Online publication date: 4-Jun-2024
  • (2023)Krypton: Real-Time Serving and Analytical SQL Engine at ByteDanceProceedings of the VLDB Endowment10.14778/3611540.361154516:12(3528-3542)Online publication date: 1-Aug-2023
  • (2023)VeriTxn: Verifiable Transactions for Cloud-Native Databases with Storage DisaggregationProceedings of the ACM on Management of Data10.1145/36267641:4(1-27)Online publication date: 12-Dec-2023
  • (2023)Rethink Query Optimization in HTAP DatabasesProceedings of the ACM on Management of Data10.1145/36267501:4(1-27)Online publication date: 12-Dec-2023
  • (2023)Benchmarking HTAP databases for performance isolation and real-time analyticsBenchCouncil Transactions on Benchmarks, Standards and Evaluations10.1016/j.tbench.2023.1001223:2(100122)Online publication date: Jun-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media