skip to main content
research-article

Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation

Published: 30 May 2023 Publication History

Abstract

Fast query execution requires learning-based cardinality estimators to have short inference time (as model inference time adds to end-to-end query execution time) and high estimation accuracy (which is crucial for finding good execution plan). However, existing estimators cannot meet both requirements due to the inherent tension between model complexity and estimation accuracy. We propose a novel Learning-based Progressive Cardinality Estimator (LPCE), which adopts a query re-optimization methodology. In particular, LPCE consists of an initial model (LPCE-I), which estimates cardinality before query execution, and a refinement model (LPCE-R), which progressively refines the cardinality estimations using the actual cardinalities of the executed operators. During query execution, re-optimization is triggered if the estimations of LPCE-I are found to have large errors, and more efficient execution plans are selected for the remaining operators using the refined estimations provided by LPCE-R. Both LPCE-I and LPCE-R are light-weight query-driven estimators but they achieve both good efficiency and high accuracy when used jointly. Besides designing the models for LPCE-I and LPCE-R, we also integrate re-optimization and LPCE into PostgreSQL, a popular database engine. Extensive experiments show that LPCE yields shorter end-to-end query execution time than state-of-the-art learning-based estimators.

Supplemental Material

MP4 File
Fast query execution requires learning-based cardinality estimators to have short inference time (as model inference time adds to end-to-end query execution time) and high estimation accuracy (which is crucial for finding good execution plan). However, existing estimators cannot meet both requirements due to the inherent tension between model complexity and estimation accuracy. We propose a novel Learning-based Progressive Cardinality Estimator (LPCE), which adopts a query re-optimization methodology. In particular, LPCE consists of an initial model (LPCE-I), which estimates cardinality before query execution, and a refinement model (LPCE-R), which progressively refines the cardinality estimations using the actual cardinalities of the executed operators. During query execution, re-optimization is triggered if the estimations of LPCE-I are found to have large errors, and more efficient execution plans are selected for the remaining operators using the refined estimations provided by LPCE-R. Both LPCE-I and LPCE-R are light-weight query-driven estimators but they achieve both good efficiency and high accuracy when used jointly. Besides designing the models for LPCE-I and LPCE-R, we also integrate re-optimization and LPCE into PostgreSQL, a popular database engine. Extensive experiments show that LPCE yields shorter end-to-end query execution time than state-of-the-art learning-based estimators.

References

[1]
Riham Abdel Kader, Peter Boncz, Stefan Manegold, and Maurice Van Keulen. 2009. ROX: run-time optimization of XQueries. In SIGMOD. 615--626.
[2]
Ashraf Aboulnaga and Surajit Chaudhuri. 1999. Self-tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD. 181--192.
[3]
Musbah Abdulkarim Musbah Ataya and Musab AM Ali. 2019. Acceptance of Website Security on E-banking. A-Review. In IEEE 10th Control and System Graduate Research Colloquium. 201--206.
[4]
Ron Avnur and Joseph M Hellerstein. 2000. Eddies: Continuously adaptive query processing. In SIGMOD. 261--272.
[5]
Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. Advances in neural information processing systems 24 (2011).
[6]
Amol Deshpande, Minos N. Garofalakis, and Rajeev Rastogi. 2001. Independence is Good: Dependency-Based Histogram Synopses for High-Dimensional Data. In SIGMOD. 199--210.
[7]
Anshuman Dutt and Jayant R Haritsa. 2014. Plan bouquets: query processing without selectivity estimation. In SIGMOD. 1039--1050.
[8]
Anshuman Dutt, Chi Wang, Vivek Narasayya, and Surajit Chaudhuri. 2020. Efficiently approximating selectivity functions using low overhead regression models. PVLDB 13, 12 (2020), 2215--2228.
[9]
Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity estimation for range predicates using lightweight models. PVLDB 12, 9 (2019), 1044--1057.
[10]
Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al . 2021. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. PVLDB 15, 4 (2021), 1--12.
[11]
Shohedul Hasan, Saravanan Thirumuruganathan, Jees Augustine, Nick Koudas, and Gautam Das. 2020. Deep learning models for selectivity estimation of multi-attribute queries. In SIGMOD. 1035--1050.
[12]
Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from Data, not from Queries! PVLDB 13, 7 (2020), 992--1005.
[13]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[14]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural computation 9, 8 (1997), 1735--1780.
[15]
Navin Kabra and David J DeWitt. 1998. Efficient mid-query re-optimization of sub-optimal query execution plans. In SIGMOD. 106--117.
[16]
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter A. Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In CIDR.
[17]
Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to optimize join queries with d reinforcement learning. arXiv preprint arXiv:1808.03196 (2018).
[18]
Hai Lan, Zhifeng Bao, and Yuwei Peng. 2021. A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration. Data Science and Engineering (2021), 1--16.
[19]
Tao Lei, Yu Zhang, and Yoav Artzi. 2017. Training RNNs as Fast as CNNs. arXiv preprint arXiv:1709.02755 (2017).
[20]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (2015), 204--215.
[21]
Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR.
[22]
Jie Liu, Wenqian Dong, Qingqing Zhou, and Dong Li. 2021. Fauce: fast and accurate deep ensembles with uncertainty for cardinality estimation. PVLDB 14, 11 (2021), 1950--1963.
[23]
Guy Lohman. 2014. Is query optimization a "solved" problem ?. In Proc. Workshop on Database Query Optimization. 10.
[24]
Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoia, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2021. Flow-Loss: Learning Cardinality Estimates That Matter. PVLDB 14, 11 (2021).
[25]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making learned query optimization practical. In SIGMOD. 1275--1288.
[26]
Ryan C. Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. PVLDB 12, 11 (2019), 1705--1718.
[27]
Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, and Miso Cilimdzic. 2004. Robust query processing through progressive optimization. In SIGMOD. 659--670.
[28]
Thomas Neumann and César A. Galindo-Legaria. 2013. Taking the Edge off Cardinality Estimation Errors using Incremental Execution. In Datenbanksysteme für Business, Technologie und Web (BTW). 73--92.
[29]
Danilo Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. In ICML. 1530--1538.
[30]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. FitNets: Hints for Thin Deep Nets. In ICLR.
[31]
Michael Stillger, Guy M Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO-DB2's learning optimizer. In VLDB. 19--28.
[32]
Ji Sun and Guoliang Li. 2019. An End-to-End Learning-based Cost Estimator. PVLDB 13, 3 (2019), 307--319.
[33]
Ji Sun, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and Nan Tang. 2021. Learned cardinality estimation: A design space exploration and a comparative evaluation. PVLDB 15, 1 (2021), 85--97.
[34]
Immanuel Trummer, Junxiong Wang, Ziyun Wei, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis, and Ankush Rayabhari. 2021. Skinnerdb: Regret-bounded query evaluation via reinforcement learning. TODS 46, 3 (2021), 1--45.
[35]
Fang Wang, Xiao Yan, Manlung Yiu, Shuai Li, Zunyao Mao, and Bo Tang. 2022. Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation (Technical report). https://github.com/Eilowangfang/LPCE/blob/master/techreport.pdf
[36]
Jiayi Wang, Chengliang Chai, Jiabin Liu, and Guoliang Li. 2021. FACE: a normalizing flow based cardinality estimator. PVLDB 15, 1 (2021), 72--84.
[37]
Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2021. Are We Ready For Learned Cardinality Estimation? PVLDB 14, 9 (2021), 1640--1654.
[38]
Peizhi Wu and Gao Cong. 2021. A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. In SIGMOD. 2009--2022.
[39]
Renzhi Wu, Bolin Ding, Xu Chu, Zhewei Wei, Xiening Dai, Tao Guan, and Jingren Zhou. 2021. Learning to be a Statistician: Learned Estimator for Number of Distinct Values. PVLDB 15, 2 (2021), 272--284.
[40]
Wentao Wu, Jeffrey F Naughton, and Harneet Singh. 2016. Sampling-based query re-optimization. In SIGMOD. 1721--1736.
[41]
Ziniu Wu, Amir Shaikhha, Rong Zhu, Kai Zeng, Yuxing Han, and Jingren Zhou. 2020. BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation. arXiv e-prints (2020), arXiv--2012.
[42]
Ziniu Wu, Rong Zhu, Andreas Pfadler, Yuxing Han, Jiangneng Li, Zhengping Qian, Kai Zeng, and Jingren Zhou. 2020. FSPN: A New Class of Probabilistic Graphical Model. CoRR abs/2011.09020 (2020). https://arxiv.org/abs/2011.09020
[43]
Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, and Ion Stoica. 2022. Balsa: Learning a Query Optimizer Without Expert Demonstrations. In SIGMOD. 931--944.
[44]
Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: One Cardinality Estimator for All Tables. PVLDB 14, 1 (2020), 61--73.
[45]
Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Peter Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. PVLDB 13, 3 (2019), 279--292.
[46]
Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement learning with tree-lstm for join order selection. In ICDE. 1297--1308.
[47]
Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexander Smola. 2017. Deep sets. arXiv preprint arXiv:1703.06114 (2017).
[48]
Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2021. FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. PVLDB 14, 9 (2021), 1489--1502.
[49]
Rong Zhu, Tianjing Zeng, Andreas Pfadler, Wei Chen, Bolin Ding, and Jingren Zhou. 2021. Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size. arXiv preprint arXiv:2112.03458 (2021).

Cited By

View all
  • (2025)An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQLProceedings of the ACM on Management of Data10.1145/37096593:1(1-28)Online publication date: 11-Feb-2025
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)On Reducing Space Amplification with Multi-Column Compaction in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368197717:11(2974-2986)Online publication date: 30-Aug-2024
  • Show More Cited By

Index Terms

  1. Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 1
    PACMMOD
    May 2023
    2807 pages
    EISSN:2836-6573
    DOI:10.1145/3603164
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 May 2023
    Published in PACMMOD Volume 1, Issue 1

    Permissions

    Request permissions for this article.

    Author Tags

    1. cardinality estimation
    2. query optimization

    Qualifiers

    • Research-article

    Funding Sources

    • GRF 152018/20E from Hong Kong RGC
    • Guangdong Basic and Applied Basic Research Foundation (Grant No. 2021A1515110067)
    • Shenzhen Fundamental Research Program (Grant No. 20220815112848002)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)287
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQLProceedings of the ACM on Management of Data10.1145/37096593:1(1-28)Online publication date: 11-Feb-2025
    • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
    • (2024)On Reducing Space Amplification with Multi-Column Compaction in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368197717:11(2974-2986)Online publication date: 30-Aug-2024
    • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 6-Aug-2024
    • (2024)CIVET: Exploring Compact Index for Variable-Length Subsequence Matching on Time SeriesProceedings of the VLDB Endowment10.14778/3665844.366584517:9(2123-2135)Online publication date: 6-Aug-2024
    • (2024)Visualization-Aware Time Series Min-Max Caching with Error Bound GuaranteesProceedings of the VLDB Endowment10.14778/3659437.365946017:8(2091-2103)Online publication date: 31-May-2024
    • (2024)Performance-Based Pricing for Federated Learning via AuctionProceedings of the VLDB Endowment10.14778/3648160.364816917:6(1269-1282)Online publication date: 3-May-2024
    • (2024)Hybrid Prompt Learning for Generating Justifications of Security Risks in Automation RulesACM Transactions on Intelligent Systems and Technology10.1145/3675401Online publication date: 29-Jun-2024
    • (2024)Databases in Edge and Fog Environments: A SurveyACM Computing Surveys10.1145/366600156:11(1-40)Online publication date: 8-Jul-2024
    • (2024)RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36549702:3(1-27)Online publication date: 30-May-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media