skip to main content
research-article

Tao: Improving Resource Utilization while Guaranteeing SLO in Multi-tenant Relational Database-as-a-Service

Published: 30 September 2024 Publication History

Abstract

It is an open challenge for cloud database service providers to guarantee tenants' service-level objectives (SLOs) and enjoy high resource utilization simultaneously. In this work, we propose a novel system Tao to overcome it. Tao consists of three key components: (i) tasklet-based DAG generator, (ii) tasklet-based DAG executor, and (iii) SLO-guaranteed scheduler. The core concept in Tao is tasklet, a coroutine-based lightweight execution unit of the physical execution plan. In particular, we first convert each SQL operator in the traditional physical execution plan into a set of fine-grained tasklets by the tasklet-based DAG generator. Then, we abstract the tasklet-based DAG execution procedure and implement the tasklet-based DAG executor using C++20 coroutines. Finally, we introduce the SLO-guaranteed scheduler for scheduling tenants' tasklets across CPU cores. This scheduler guarantees tenants' SLOs with a token bucket model and improves resource utilization with an on-demand core adjustment strategy. We build Tao on an open-sourced relational database, Hyrise, and conduct extensive experimental studies to demonstrate its superiority over existing solutions.

References

[1]
2017. Working Draft, Technical Specification for C Extensions for Coroutines. https://www.openstd.org/jtc1/sc22/ wg21/docs/papers/2017/n4649.pdf.
[2]
2023. Amazon relational database service. https://aws.amazon.com/cn/rds/.
[3]
2023. Azure SQL DB. https://azure.microsoft.com/products/azure-sql/database/.
[4]
2023. Blocking Sink/Source operators. https://github.com/duckdb/duckdb/pull/7331.
[5]
2023. Boost Coroutine2. https://www.boost.org/doc/libs/1_83_0/libs/coroutine2/doc/html/coroutine2/overview.html.
[6]
2023. Boost Fiber. https://www.boost.org/doc/libs/1_83_0/libs/fiber/doc/html/fiber/overview.html.
[7]
2023. concurrentqueue. https://github.com/cameron314/concurrentqueue.
[8]
2023. Control Group v2. https://www.kernel.org/doc/html/v5.10/admin-guide/cgroup-v2.html.
[9]
2023. libco. https://github.com/Tencent/libco.
[10]
2024. lightweight pooling (server configuration option). https://learn.microsoft.com/en-us/sql/database-engine/configurewindows/ lightweight-pooling-server-configuration-option?view=sql-server-ver16.
[11]
Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana- Maria Popa. 2020. Firecracker: Lightweight virtualization for serverless applications. In NSDI 20. 419--434.
[12]
Panagiotis Antonopoulos, Alex Budovski, Cristian Diaconu, Alejandro Hernandez Saenz, Jack Hu, Hanuma Kodavalla, Donald Kossmann, Sandeep Lingam, Umar Farooq Minhas, Naveen Prakash, et al. 2019. Socrates: The new sql server in the cloud. In SIGMOD. 1743--1756.
[13]
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. ACM SIGOPS operating systems review 37, 5 (2003), 164--177.
[14]
Leilani Battle, Philipp Eichmann, Marco Angelini, Tiziana Catarci, Giuseppe Santucci, Yukun Zheng, Carsten Binnig, Jean-Daniel Fekete, and Dominik Moritz. 2020. Database benchmarking for supporting real-time interactive querying of large data. In SIGMOD. 1571--1587.
[15]
David Bernstein. 2014. Containers and cloud: From lxc to docker to kubernetes. IEEE cloud computing 1, 3 (2014), 81--84.
[16]
Haoqiong Bian, Tiannan Sha, and Anastasia Ailamaki. 2023. Using Cloud Functions as Accelerator for Elastic Data Analytics. SIGMOD 1, 2 (2023), 1--27.
[17]
Wei Cao, Yingqiang Zhang, Xinjun Yang, Feifei Li, Sheng Wang, Qingda Hu, Xuntao Cheng, Zongzhi Chen, Zhenjun Liu, Jing Fang, et al. 2021. Polardb serverless: A cloud native database for disaggregated data centers. In SIGMOD. 2477--2489.
[18]
Shuang Chen, Christina Delimitrou, and José F Martínez. 2019. Parties: Qos-aware resource partitioning for multiple interactive services. In ASPLOS. 107--120.
[19]
Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, et al. 2016. The snowflake elastic data warehouse. In SIGMOD. 215--226.
[20]
Sudipto Das, Vivek R Narasayya, Feng Li, and Manoj Syamala. 2013. CPU sharing techniques for performance isolation in multi-tenant relational database-as-a-service. PVLDB 7, 1 (2013), 37--48.
[21]
Philipp Eichmann, Emanuel Zgraggen, Carsten Binnig, and Tim Kraska. 2020. Idebench: A benchmark for interactive data exploration. In SIGMOD. 1555--1569.
[22]
Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating interference at microsecond timescales. In OSDI. 281--297.
[23]
Yongjun He, Jiacheng Lu, and Tianzheng Wang. 2020. CoroBase: Coroutine-Oriented Main-Memory Database Engine. PVLDB 14, 3 (2020), 431--444.
[24]
Yigong Hu, Gongqi Huang, and Peng Huang. 2023. Pushing Performance Isolation Boundaries into Application with pBox. In SOSP. 247--263.
[25]
Kaisong Huang, Tianzheng Wang, Qingqing Zhou, and Qingzhong Meng. 2023. The Art of Latency Hiding in Modern Database Engines. PVLDB 17, 3 (2023), 577--590.
[26]
Jack Tigar Humphries, Neel Natu, Ashwin Chaugule, OfirWeisse, Barret Rhoden, Josh Don, Luigi Rizzo, Oleg Rombakh, Paul Turner, and Christos Kozyrakis. 2021. ghost: Fast & flexible user-space delegation of linux scheduling. In SOSP. 588--604.
[27]
International Organization for Standardization (ISO). 2020. Programming languages ? C. Technical Report ISO/IEC 14882:2020.
[28]
Christopher Jonathan, Umar Farooq Minhas, James Hunter, Justin Levandoski, and Gor Nishanov. 2018. Exploiting coroutines to attack the" killer nanoseconds". PVLDB 11, 11 (2018), 1702--1714.
[29]
Gopal Kakivaya, Lu Xun, Richard Hasha, Shegufta Bakht Ahsan, Todd Pfleiger, Rishi Sinha, Anurag Gupta, Mihail Tarta, Mark Fussell, Vipul Modi, et al. 2018. Service fabric: a distributed platform for building microservices in the cloud. In EuroSys. 1--15.
[30]
Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. 2007. kvm: the Linux virtual machine monitor. In Proceedings of the Linux symposium, Vol. 1. Dttawa, Dntorio, Canada, 225--230.
[31]
Arnd Christian König, Yi Shan, Karan Newatia, Luke Marshall, and Vivek Narasayya. 2023. Solver-In-The-Loop Cluster Resource Management for Database-as-a-Service. PVLDB 16, 13 (2023), 4254--4267.
[32]
Viktor Leis, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD. 743--754.
[33]
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In ISCA. 450--462.
[34]
Jan Mühlig and Jens Teubner. 2021. MxTasks: How to Make Efficient Synchronization and Prefetching Easy. In SIGMOD. 1331--1344.
[35]
Vivek Narasayya, Surajit Chaudhuri, et al. 2021. Cloud data services: Workloads, architectures and multi-tenancy. Foundations and Trends® in Databases 10, 1 (2021), 1--107.
[36]
Vivek Narasayya, Sudipto Das, Manoj Syamala, Badrish Chandramouli, and Surajit Chaudhuri. 2013. Sqlvm: Performance isolation in multi-tenant relational database-as-a-service. In CIDR.
[37]
Tirthak Patel and Devesh Tiwari. 2020. Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In HPCA. 193--206.
[38]
Matthew Perron, Raul Castro Fernandez, David DeWitt, and Samuel Madden. 2020. Starling: A scalable query engine on cloud functions. In SIGMOD. 131--141.
[39]
Georgios Psaropoulos, Thomas Legler, Norman May, and Anastasia Ailamaki. 2019. Interleaving with coroutines: a systematic and practical approach to hide memory latency in index joins. VLDBJ 28, 4 (2019), 451--471.
[40]
Henry Qin, Qian Li, Jacqueline Speiser, Peter Kraft, and John Ousterhout. 2018. Arachne:{Core-Aware} thread management. In OSDI. 145--160.
[41]
Mark Raasveldt and Hannes Mühleisen. 2019. Duckdb: an embeddable analytical database. In SIGMOD. 1981--1984.
[42]
Zhenyuan Ruan, Seo Jin Park, Marcos K Aguilera, Adam Belay, and Malte Schwarzkopf. 2023. Nu: Achieving {Microsecond-Scale} Resource Fungibility with Logical Processes. In NSDI. 1409--1427.
[43]
Michael Stonebraker. 1981. Operating system support for database management. Commun. ACM 24, 7 (1981), 412--418.
[44]
Puqi Perry Tang and T-YC Tai. 1999. Network traffic characterization using token bucket model. In INFOCOM, Vol. 1. IEEE, 51--62.
[45]
Bhuvan Urgaonkar, Prashant Shenoy, and Timothy Roscoe. 2002. Resource overbooking and application profiling in shared hosting platforms. ACM SIGOPS Operating Systems Review 36, SI (2002), 239--254.
[46]
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. Amazon Aurora: Design considerations for high throughput cloud-native relational databases. In SIGMOD. 1041--1052.
[47]
Midhul Vuppalapati, Justin Miron, Rachit Agarwal, Dan Truong, Ashish Motivala, and Thierry Cruanes. 2020. Building an elastic query engine on disaggregated storage. In NSDI. 449--462.
[48]
Benjamin Wagner, André Kohn, and Thomas Neumann. 2021. Self-tuning query scheduling for analytical workloads. In SIGMOD. 1879--1891.
[49]
Huanchen Zhang, Yihao Liu, and Jiaqi Yan. 2024. Cost-Intelligent Data Analytics in the Cloud. In CIDR.

Cited By

View all
  • (2024)Error-controlled Progressive Retrieval of Scientific Data under Derivable Quantities of InterestProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00092(1-16)Online publication date: 17-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 4
SIGMOD
September 2024
458 pages
EISSN:2836-6573
DOI:10.1145/3698442
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2024
Published in PACMMOD Volume 2, Issue 4

Permissions

Request permissions for this article.

Author Tags

  1. database-as-a-service
  2. multi-tenancy
  3. service-level objective

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)230
  • Downloads (Last 6 weeks)40
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Error-controlled Progressive Retrieval of Scientific Data under Derivable Quantities of InterestProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00092(1-16)Online publication date: 17-Nov-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media