article

Bao: Making Learned Query Optimization Practical

Authors:

Parimarjan Negi,

Mohammad Alizadeh,

Tim KraskaAuthors Info & Claims

ACM SIGMOD Record, Volume 51, Issue 1

Pages 6 - 13

https://doi.org/10.1145/3542700.3542703

Published: 01 June 2022 Publication History

Abstract

Recent efforts applying machine learning techniques to query optimization have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tail performance. Motivated by these difficulties, we introduce Bao (the Bandit optimizer). Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints. Bao combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm. As a result, Bao automatically learns from its mistakes and adapts to changes in query workloads, data, and schema. Experimentally, we demonstrate that Bao can quickly learn strategies that improve end-to-end query execution performance, including tail latency, for several workloads containing longrunning queries. In cloud environments, we show that Bao can offer both reduced costs and better performance compared with a commercial system.

References

[1]

Google Cloud Platform, https://cloud.google.com/.

[2]

C. Anagnostopoulos and P. Triantafillou. Learning to accurately COUNT with query-driven predictive analytics. In 2015 IEEE International Conference on Big Data (Big Data), Big Data '15, pages 14--23, Oct. 2015.

Digital Library

[3]

O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems, NIPS'11, 2011.

[4]

M. Collier and H. U. Llorens. Deep Contextual Multi-armed Bandits. arXiv:1807.09809 [cs, stat], July 2018.

[5]

B. Ding, S. Das, R. Marcus, W. Wu, S. Chaudhuri, and V. R. Narasayya. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In 38th ACM Special Interest Group in Data Management, SIGMOD '19, 2019.

Digital Library

[6]

J. Duggan, O. Papaemmanouil, U. Cetintemel, and E. Upfal. Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction. In Proceedings of the 14th International Conference on Extending Database Technology, EDBT '14, pages 109--120, 2014.

[7]

R. C. Fernandez and S. Madden. Termite: A System for Tunneling Through Heterogeneous Data. In AIDM @ SIGMOD 2019, aiDM '19, 2019.

Digital Library

[8]

J. Gottschlich, A. Solar-Lezama, N. Tatbul, M. Carbin, M. Rinard, R. Barzilay, S. Amarasinghe, J. B. Tenenbaum, and T. Mattson. The three pillars of machine programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2018, pages 69--80, Philadelphia, PA, USA, June 2018. Association for Computing Machinery.

Digital Library

[9]

R. B. Guo and K. Daudjee. Research challenges in deep reinforcement learning-based join query optimization. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM '20, pages 1--6, Portland, Oregon, June 2020. Association for Computing Machinery.

Digital Library

[10]

S. Jain, B. Howe, J. Yan, and T. Cruanes. Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics. arXiv:1801.05613 [cs], Feb. 2018.

[11]

T. Kaftan, M. Balazinska, A. Cheung, and J. Gehrke. Cuttlefish: A Lightweight Primitive for Adaptive Query Processing. arXiv preprint, Feb. 2018.

[12]

A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.

[13]

T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, New York, NY, USA, 2018. ACM.

Digital Library

[14]

S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to Optimize Join Queries With Deep Reinforcement Learning. arXiv:1808.03196 [cs], Aug. 2018.

[15]

V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How Good Are Query Optimizers, Really? PVLDB, 9(3):204--215, 2015.

Digital Library

[16]

H. Liu, M. Xu, Z. Yu, V. Corvinelli, and C. Zuzarte. Cardinality Estimation Using Neural Networks. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, CASCON '15, pages 53--59, Riverton, NJ, USA, 2015. IBM Corp.

Digital Library

[17]

G. Lohman. Is Query Optimization a ?"Solved" Problem? In ACM SIGMOD Blog, ACM Blog '14, 2014.

[18]

R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska. Bao: Making Learned Query Optimization Practical. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD '21, China, June 2021.

Digital Library

[19]

R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A Learned Query Optimizer. PVLDB, 12(11):1705--1718, 2019.

[20]

R. Marcus and O. Papaemmanouil. Deep Reinforcement Learning for Join Order Enumeration. In First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM @ SIGMOD '18, Houston, TX, 2018.

[21]

T. M. Mitchell. The Need for Biases in Learning Generalizations. Technical report, 1980.

[22]

L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI '16, pages 1287--1293, Phoenix, Arizona, 2016. AAAI Press.

[23]

P. Negi, M. Interlandi, R. Marcus, M. Alizadeh, T. Kraska, M. Friedman, and A. Jindal. Steering Query Optimizers: A Practical Take on Big Data Workloads. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD '21, pages 2557--2569, Virtual Event China, June 2021. ACM.

Digital Library

[24]

P. Negi, R. Marcus, H. Mao, N. Tatbul, T. Kraska, and M. Alizadeh. Cost-Guided Cardinality Estimation: Focus Where it Matters. In Workshop on Self-Managing Databases, SMDB @ ICDE '20, 2020.

[25]

J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. Learning State Representations for Query Optimization with Deep Reinforcement Learning. In 2nd Workshop on Data Managmeent for End-to-End Machine Learning, DEEM '18, 2018.

[26]

J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. An Empirical Analysis of Deep Learning for Cardinality Estimation. arXiv:1905.06425 [cs], Sept. 2019.

[27]

Y. Park, S. Zhong, and B. Mozafari. QuickSel: Quick Selectivity Learning with Mixture Models. arXiv:1812.10568 [cs], Dec. 2018.

[28]

A. Pavlo, E. P. C. Jones, and S. Zdonik. On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems. PVLDB, 5(2):86--96, 2011.

[29]

A. G. Read. DeWitt clauses: Can we protect purchasers without hurting Microsoft. Rev. Litig., 25:387, 2006.

[30]

P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In J. Mylopolous and M. Brodie, editors, SIGMOD '79, SIGMOD '79, pages 511--522, San Francisco (CA), 1979. Morgan Kaufmann.

[31]

Shrainik Jain, Jiaqi Yan, Thiery Cruanes, and Bill Howe. Database-Agnostic Workload Management. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.

[32]

M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - DB2's LEarning Optimizer. In VLDB, VLDB '01, pages 19--28, 2001.

Digital Library

[33]

J. Sun and G. Li. An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 13(3):307--319, Nov. 2019.

Digital Library

[34]

W. R. Thompson. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika, 1933.

[35]

I. Trummer, S. Moseley, D. Maram, S. Jo, and J. Antonakakis. SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning. PVLDB, 11(12):2074--2077, 2018.

Digital Library

[36]

K. Tzoumas, T. Sellis, and C. Jensen. A Reinforcement Learning Approach for Adaptive Query Processing. Technical Reports, June 2008.

[37]

Z. Yang, A. Kamsetty, S. Luan, E. Liang, Y. Duan, X. Chen, and I. Stoica. NeuroCard: One Cardinality Estimator for All Tables. arXiv:2006.08109 [cs], June 2020.

[38]

Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, X. Chen, P. Abbeel, J. M. Hellerstein, S. Krishnan, and I. Stoica. Deep unsupervised cardinality estimation. Proceedings of the VLDB Endowment, 13(3):279--292, Nov. 2019.

Digital Library

[39]

L. Zhou. A Survey on Contextual Multi-armed Bandits. arXiv:1508.03326 [cs], Feb. 2016.

Cited By

Bergmann RHartmann CHabich DLehner W(2025)An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQLProceedings of the ACM on Management of Data10.1145/37096593:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709659
Giannakouris VTrummer I(2025)λ-Tune: Harnessing Large Language Models for Automated Database System TuningProceedings of the ACM on Management of Data10.1145/37096523:1(1-26)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709652
Shankhdhar PLiu FNarale JSun JSchlussel RAntova L(2024)Presto's History-Based Query OptimizerProceedings of the VLDB Endowment10.14778/3685800.368582817:12(4077-4089)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685828
Show More Cited By

Bao: Making Learned Query Optimization Practical

Recommendations

Bao: Making Learned Query Optimization Practical
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Recent efforts applying machine learning techniques to query optimization have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tail performance. Motivated by these difficulties, we introduce Bao (...
The Effect of Artificial Neural Network Model Combined with Six Tumor Markers in Auxiliary Diagnosis of Lung Cancer

To evaluate the diagnosis potential of artificial neural network (ANN) model combined with six tumor markers in auxiliary diagnosis of lung cancer, to differentiate lung cancer from lung benign disease, normal control, and gastrointestinal cancers. ...
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database Theory

The problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record

ACM SIGMOD Record Volume 51, Issue 1

March 2022

90 pages

ISSN:0163-5808

DOI:10.1145/3542700

Editors:
Rada Chirkova
North Carolina State University
,
Vanessa Braganholo
Universidade Federal Fluminense
,
Wim Martens
University of Bayreuth
,
Manos Athanassoulis
DBrainstorming
,
Marcelo Arenas
Research Highlights
,
Marianne Winslett
University of Illinois
,
Jun Yang
Duke University
,
Susan B. Davidson
The Future of Data(base) Education
,
Lyublena Antova
Datometry
,
Aaron J. Elmore
University of Chicago
,
Kyriakos Mouratidis
Singapore Management University
,
Dan Olteanu
University of Oxford
,
Immanuel Trummer
Cornell University
,
Yannis Velegrakis
Utrecht University
,
Renata Borovica-Gajic
Surveys
,
Tamer Özsu
University of Waterloo
,
Pınar Tözün
IT University of Copenhagen
,
Wook-Shin Han
Research and Vision columns
,
Kenneth Ross
Research Highlights

Issue’s Table of Contents

Copyright © 2022 Copyright is held by the owner/author(s).

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2022

Published in SIGMOD Volume 51, Issue 1

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
617
Total Downloads

Downloads (Last 12 months)85
Downloads (Last 6 weeks)7

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bergmann RHartmann CHabich DLehner W(2025)An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQLProceedings of the ACM on Management of Data10.1145/37096593:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709659
Giannakouris VTrummer I(2025)λ-Tune: Harnessing Large Language Models for Automated Database System TuningProceedings of the ACM on Management of Data10.1145/37096523:1(1-26)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709652
Shankhdhar PLiu FNarale JSun JSchlussel RAntova L(2024)Presto's History-Based Query OptimizerProceedings of the VLDB Endowment10.14778/3685800.368582817:12(4077-4089)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685828
Yu GWu ZKossmann FLi TMarkakis MNgom AMadden SKraska T(2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682026
Reiner SGrossniklaus M(2024)Sample-Efficient Cardinality Estimation Using Geometric Deep LearningProceedings of the VLDB Endowment10.14778/3636218.363622917:4(740-752)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.14778/3636218.3636229
Wei ZTrummer I(2024)ROME: Robust Query Optimization via Parallel Multi-Plan ExecutionProceedings of the ACM on Management of Data10.1145/36549732:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654973
Siddiqui TWu W(2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3641832.3641836
Huang HSiddiqui TAlotaibi RCurino CLeeka JJindal AZhao JCamacho-Rodríguez JTian Y(2024)Sibyl: Forecasting Time-Evolving Query WorkloadsProceedings of the ACM on Management of Data10.1145/36393082:1(1-27)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639308
Wu PIves Z(2024)Modeling Shifting Workloads for Learned Database SystemsProceedings of the ACM on Management of Data10.1145/36392932:1(1-27)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639293
Giannakouris VTrummer IBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Demonstrating λ-Tune: Exploiting Large Language Models for Workload-Adaptive Database System TuningCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654751(508-511)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654751
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents