research-article

Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation

Authors:

Bo TangAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 1

Article No.: 28, Pages 1 - 25

https://doi.org/10.1145/3588708

Published: 30 May 2023 Publication History

Abstract

Fast query execution requires learning-based cardinality estimators to have short inference time (as model inference time adds to end-to-end query execution time) and high estimation accuracy (which is crucial for finding good execution plan). However, existing estimators cannot meet both requirements due to the inherent tension between model complexity and estimation accuracy. We propose a novel Learning-based Progressive Cardinality Estimator (LPCE), which adopts a query re-optimization methodology. In particular, LPCE consists of an initial model (LPCE-I), which estimates cardinality before query execution, and a refinement model (LPCE-R), which progressively refines the cardinality estimations using the actual cardinalities of the executed operators. During query execution, re-optimization is triggered if the estimations of LPCE-I are found to have large errors, and more efficient execution plans are selected for the remaining operators using the refined estimations provided by LPCE-R. Both LPCE-I and LPCE-R are light-weight query-driven estimators but they achieve both good efficiency and high accuracy when used jointly. Besides designing the models for LPCE-I and LPCE-R, we also integrate re-optimization and LPCE into PostgreSQL, a popular database engine. Extensive experiments show that LPCE yields shorter end-to-end query execution time than state-of-the-art learning-based estimators.

Supplemental Material

MP4 File

Fast query execution requires learning-based cardinality estimators to have short inference time (as model inference time adds to end-to-end query execution time) and high estimation accuracy (which is crucial for finding good execution plan). However, existing estimators cannot meet both requirements due to the inherent tension between model complexity and estimation accuracy. We propose a novel Learning-based Progressive Cardinality Estimator (LPCE), which adopts a query re-optimization methodology. In particular, LPCE consists of an initial model (LPCE-I), which estimates cardinality before query execution, and a refinement model (LPCE-R), which progressively refines the cardinality estimations using the actual cardinalities of the executed operators. During query execution, re-optimization is triggered if the estimations of LPCE-I are found to have large errors, and more efficient execution plans are selected for the remaining operators using the refined estimations provided by LPCE-R. Both LPCE-I and LPCE-R are light-weight query-driven estimators but they achieve both good efficiency and high accuracy when used jointly. Besides designing the models for LPCE-I and LPCE-R, we also integrate re-optimization and LPCE into PostgreSQL, a popular database engine. Extensive experiments show that LPCE yields shorter end-to-end query execution time than state-of-the-art learning-based estimators.

Download
70.58 MB

References

[1]

Riham Abdel Kader, Peter Boncz, Stefan Manegold, and Maurice Van Keulen. 2009. ROX: run-time optimization of XQueries. In SIGMOD. 615--626.

[2]

Ashraf Aboulnaga and Surajit Chaudhuri. 1999. Self-tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD. 181--192.

Digital Library

[3]

Musbah Abdulkarim Musbah Ataya and Musab AM Ali. 2019. Acceptance of Website Security on E-banking. A-Review. In IEEE 10th Control and System Graduate Research Colloquium. 201--206.

[4]

Ron Avnur and Joseph M Hellerstein. 2000. Eddies: Continuously adaptive query processing. In SIGMOD. 261--272.

Digital Library

[5]

Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. Advances in neural information processing systems 24 (2011).

[6]

Amol Deshpande, Minos N. Garofalakis, and Rajeev Rastogi. 2001. Independence is Good: Dependency-Based Histogram Synopses for High-Dimensional Data. In SIGMOD. 199--210.

[7]

Anshuman Dutt and Jayant R Haritsa. 2014. Plan bouquets: query processing without selectivity estimation. In SIGMOD. 1039--1050.

[8]

Anshuman Dutt, Chi Wang, Vivek Narasayya, and Surajit Chaudhuri. 2020. Efficiently approximating selectivity functions using low overhead regression models. PVLDB 13, 12 (2020), 2215--2228.

Digital Library

[9]

Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity estimation for range predicates using lightweight models. PVLDB 12, 9 (2019), 1044--1057.

Digital Library

[10]

Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al . 2021. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. PVLDB 15, 4 (2021), 1--12.

[11]

Shohedul Hasan, Saravanan Thirumuruganathan, Jees Augustine, Nick Koudas, and Gautam Das. 2020. Deep learning models for selectivity estimation of multi-attribute queries. In SIGMOD. 1035--1050.

[12]

Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from Data, not from Queries! PVLDB 13, 7 (2020), 992--1005.

Digital Library

[13]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[14]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[15]

Navin Kabra and David J DeWitt. 1998. Efficient mid-query re-optimization of sub-optimal query execution plans. In SIGMOD. 106--117.

[16]

Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter A. Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In CIDR.

[17]

Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to optimize join queries with d reinforcement learning. arXiv preprint arXiv:1808.03196 (2018).

[18]

Hai Lan, Zhifeng Bao, and Yuwei Peng. 2021. A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration. Data Science and Engineering (2021), 1--16.

[19]

Tao Lei, Yu Zhang, and Yoav Artzi. 2017. Training RNNs as Fast as CNNs. arXiv preprint arXiv:1709.02755 (2017).

[20]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (2015), 204--215.

Digital Library

[21]

Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR.

[22]

Jie Liu, Wenqian Dong, Qingqing Zhou, and Dong Li. 2021. Fauce: fast and accurate deep ensembles with uncertainty for cardinality estimation. PVLDB 14, 11 (2021), 1950--1963.

Digital Library

[23]

Guy Lohman. 2014. Is query optimization a "solved" problem ?. In Proc. Workshop on Database Query Optimization. 10.

[24]

Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoia, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2021. Flow-Loss: Learning Cardinality Estimates That Matter. PVLDB 14, 11 (2021).

[25]

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making learned query optimization practical. In SIGMOD. 1275--1288.

[26]

Ryan C. Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. PVLDB 12, 11 (2019), 1705--1718.

Digital Library

[27]

Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, and Miso Cilimdzic. 2004. Robust query processing through progressive optimization. In SIGMOD. 659--670.

[28]

Thomas Neumann and César A. Galindo-Legaria. 2013. Taking the Edge off Cardinality Estimation Errors using Incremental Execution. In Datenbanksysteme für Business, Technologie und Web (BTW). 73--92.

[29]

Danilo Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. In ICML. 1530--1538.

[30]

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. FitNets: Hints for Thin Deep Nets. In ICLR.

[31]

Michael Stillger, Guy M Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO-DB2's learning optimizer. In VLDB. 19--28.

[32]

Ji Sun and Guoliang Li. 2019. An End-to-End Learning-based Cost Estimator. PVLDB 13, 3 (2019), 307--319.

Digital Library

[33]

Ji Sun, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and Nan Tang. 2021. Learned cardinality estimation: A design space exploration and a comparative evaluation. PVLDB 15, 1 (2021), 85--97.

Digital Library

[34]

Immanuel Trummer, Junxiong Wang, Ziyun Wei, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis, and Ankush Rayabhari. 2021. Skinnerdb: Regret-bounded query evaluation via reinforcement learning. TODS 46, 3 (2021), 1--45.

Digital Library

[35]

Fang Wang, Xiao Yan, Manlung Yiu, Shuai Li, Zunyao Mao, and Bo Tang. 2022. Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation (Technical report). https://github.com/Eilowangfang/LPCE/blob/master/techreport.pdf

[36]

Jiayi Wang, Chengliang Chai, Jiabin Liu, and Guoliang Li. 2021. FACE: a normalizing flow based cardinality estimator. PVLDB 15, 1 (2021), 72--84.

Digital Library

[37]

Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2021. Are We Ready For Learned Cardinality Estimation? PVLDB 14, 9 (2021), 1640--1654.

Digital Library

[38]

Peizhi Wu and Gao Cong. 2021. A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. In SIGMOD. 2009--2022.

[39]

Renzhi Wu, Bolin Ding, Xu Chu, Zhewei Wei, Xiening Dai, Tao Guan, and Jingren Zhou. 2021. Learning to be a Statistician: Learned Estimator for Number of Distinct Values. PVLDB 15, 2 (2021), 272--284.

Digital Library

[40]

Wentao Wu, Jeffrey F Naughton, and Harneet Singh. 2016. Sampling-based query re-optimization. In SIGMOD. 1721--1736.

[41]

Ziniu Wu, Amir Shaikhha, Rong Zhu, Kai Zeng, Yuxing Han, and Jingren Zhou. 2020. BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation. arXiv e-prints (2020), arXiv--2012.

[42]

Ziniu Wu, Rong Zhu, Andreas Pfadler, Yuxing Han, Jiangneng Li, Zhengping Qian, Kai Zeng, and Jingren Zhou. 2020. FSPN: A New Class of Probabilistic Graphical Model. CoRR abs/2011.09020 (2020). https://arxiv.org/abs/2011.09020

[43]

Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, and Ion Stoica. 2022. Balsa: Learning a Query Optimizer Without Expert Demonstrations. In SIGMOD. 931--944.

[44]

Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: One Cardinality Estimator for All Tables. PVLDB 14, 1 (2020), 61--73.

Digital Library

[45]

Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Peter Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. PVLDB 13, 3 (2019), 279--292.

Digital Library

[46]

Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement learning with tree-lstm for join order selection. In ICDE. 1297--1308.

[47]

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexander Smola. 2017. Deep sets. arXiv preprint arXiv:1703.06114 (2017).

[48]

Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2021. FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. PVLDB 14, 9 (2021), 1489--1502.

Digital Library

[49]

Rong Zhu, Tianjing Zeng, Andreas Pfadler, Wei Chen, Bolin Ding, and Jingren Zhou. 2021. Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size. arXiv preprint arXiv:2112.03458 (2021).

Cited By

Bergmann RHartmann CHabich DLehner W(2025)An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQLProceedings of the ACM on Management of Data10.1145/37096593:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709659
Zhang WLim WButrovich MPavlo A(2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.14778/3681954.3682007
Fang CChen ZSong SHuang XWang CWang J(2024)On Reducing Space Amplification with Multi-Column Compaction in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368197717:11(2974-2986)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.14778/3681954.3681977
Show More Cited By

Index Terms

Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing

Recommendations

FactorJoin: A New Cardinality Estimation Framework for Join Queries
PACMMOD

Cardinality estimation is one of the most fundamental and challenging problems in query optimization. Neither classical nor learning-based methods yield satisfactory performance when estimating the cardinality of the join queries. They either rely on ...
Query optimization through the looking glass, and what we found running the Join Order Benchmark

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark that works on real-life data riddled with correlations and introduces 113 complex join queries. We experimentally revisit the main ...
Cardinality estimation via learned dynamic sample selection
Abstract
Sampling is an effective approach to cardinality estimation which in turn is a key to query optimization in a DBMS. Although there have been a lot of studies on applying machine learning to cardinality estimation recently, enhancing ...
Highlights
- We propose the first cardinality estimation framework using dynamic sample selection.

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 1, Issue 1

PACMMOD

May 2023

2807 pages

EISSN:2836-6573

DOI:10.1145/3603164

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023

Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article

Funding Sources

GRF 152018/20E from Hong Kong RGC
Guangdong Basic and Applied Basic Research Foundation (Grant No. 2021A1515110067)
Shenzhen Fundamental Research Program (Grant No. 20220815112848002)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
727
Total Downloads

Downloads (Last 12 months)287
Downloads (Last 6 weeks)15

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bergmann RHartmann CHabich DLehner W(2025)An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQLProceedings of the ACM on Management of Data10.1145/37096593:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709659
Zhang WLim WButrovich MPavlo A(2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.14778/3681954.3682007
Fang CChen ZSong SHuang XWang CWang J(2024)On Reducing Space Amplification with Multi-Column Compaction in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368197717:11(2974-2986)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.14778/3681954.3681977
Hu HQiu JWang HLiang BZou S(2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 6-Aug-2024
https://dl.acm.org/doi/10.14778/3665844.3665851
Xiong HZhang HWang ZHe ZWang PWang X(2024)CIVET: Exploring Compact Index for Variable-Length Subsequence Matching on Time SeriesProceedings of the VLDB Endowment10.14778/3665844.366584517:9(2123-2135)Online publication date: 6-Aug-2024
https://dl.acm.org/doi/10.14778/3665844.3665845
Maroulis SStamatopoulos VPapastefanatos GTerrovitis M(2024)Visualization-Aware Time Series Min-Max Caching with Error Bound GuaranteesProceedings of the VLDB Endowment10.14778/3659437.365946017:8(2091-2103)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659460
Li ZDing BYao LLi YXiao XZhou J(2024)Performance-Based Pricing for Federated Learning via AuctionProceedings of the VLDB Endowment10.14778/3648160.364816917:6(1269-1282)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.14778/3648160.3648169
Breve BCimino GDeufemia V(2024)Hybrid Prompt Learning for Generating Justifications of Security Risks in Automation RulesACM Transactions on Intelligent Systems and Technology10.1145/3675401Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1145/3675401
Meruje Ferreira LCoelho FPereira J(2024)Databases in Edge and Fog Environments: A SurveyACM Computing Surveys10.1145/366600156:11(1-40)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3666001
Gao JLong C(2024)RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36549702:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654970
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents