skip to main content
article

Bao: Making Learned Query Optimization Practical

Published:01 June 2022Publication History
Skip Abstract Section

Abstract

Recent efforts applying machine learning techniques to query optimization have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tail performance. Motivated by these difficulties, we introduce Bao (the Bandit optimizer). Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints. Bao combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm. As a result, Bao automatically learns from its mistakes and adapts to changes in query workloads, data, and schema. Experimentally, we demonstrate that Bao can quickly learn strategies that improve end-to-end query execution performance, including tail latency, for several workloads containing longrunning queries. In cloud environments, we show that Bao can offer both reduced costs and better performance compared with a commercial system.

References

  1. Google Cloud Platform, https://cloud.google.com/.Google ScholarGoogle Scholar
  2. C. Anagnostopoulos and P. Triantafillou. Learning to accurately COUNT with query-driven predictive analytics. In 2015 IEEE International Conference on Big Data (Big Data), Big Data '15, pages 14--23, Oct. 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems, NIPS'11, 2011.Google ScholarGoogle Scholar
  4. M. Collier and H. U. Llorens. Deep Contextual Multi-armed Bandits. arXiv:1807.09809 [cs, stat], July 2018.Google ScholarGoogle Scholar
  5. B. Ding, S. Das, R. Marcus, W. Wu, S. Chaudhuri, and V. R. Narasayya. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In 38th ACM Special Interest Group in Data Management, SIGMOD '19, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Duggan, O. Papaemmanouil, U. Cetintemel, and E. Upfal. Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction. In Proceedings of the 14th International Conference on Extending Database Technology, EDBT '14, pages 109--120, 2014.Google ScholarGoogle Scholar
  7. R. C. Fernandez and S. Madden. Termite: A System for Tunneling Through Heterogeneous Data. In AIDM @ SIGMOD 2019, aiDM '19, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Gottschlich, A. Solar-Lezama, N. Tatbul, M. Carbin, M. Rinard, R. Barzilay, S. Amarasinghe, J. B. Tenenbaum, and T. Mattson. The three pillars of machine programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2018, pages 69--80, Philadelphia, PA, USA, June 2018. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. B. Guo and K. Daudjee. Research challenges in deep reinforcement learning-based join query optimization. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM '20, pages 1--6, Portland, Oregon, June 2020. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Jain, B. Howe, J. Yan, and T. Cruanes. Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics. arXiv:1801.05613 [cs], Feb. 2018.Google ScholarGoogle Scholar
  11. T. Kaftan, M. Balazinska, A. Cheung, and J. Gehrke. Cuttlefish: A Lightweight Primitive for Adaptive Query Processing. arXiv preprint, Feb. 2018.Google ScholarGoogle Scholar
  12. A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.Google ScholarGoogle Scholar
  13. T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, New York, NY, USA, 2018. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to Optimize Join Queries With Deep Reinforcement Learning. arXiv:1808.03196 [cs], Aug. 2018.Google ScholarGoogle Scholar
  15. V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How Good Are Query Optimizers, Really? PVLDB, 9(3):204--215, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Liu, M. Xu, Z. Yu, V. Corvinelli, and C. Zuzarte. Cardinality Estimation Using Neural Networks. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, CASCON '15, pages 53--59, Riverton, NJ, USA, 2015. IBM Corp.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Lohman. Is Query Optimization a ?"Solved" Problem? In ACM SIGMOD Blog, ACM Blog '14, 2014.Google ScholarGoogle Scholar
  18. R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska. Bao: Making Learned Query Optimization Practical. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD '21, China, June 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A Learned Query Optimizer. PVLDB, 12(11):1705--1718, 2019.Google ScholarGoogle Scholar
  20. R. Marcus and O. Papaemmanouil. Deep Reinforcement Learning for Join Order Enumeration. In First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM @ SIGMOD '18, Houston, TX, 2018.Google ScholarGoogle Scholar
  21. T. M. Mitchell. The Need for Biases in Learning Generalizations. Technical report, 1980.Google ScholarGoogle Scholar
  22. L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI '16, pages 1287--1293, Phoenix, Arizona, 2016. AAAI Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Negi, M. Interlandi, R. Marcus, M. Alizadeh, T. Kraska, M. Friedman, and A. Jindal. Steering Query Optimizers: A Practical Take on Big Data Workloads. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD '21, pages 2557--2569, Virtual Event China, June 2021. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Negi, R. Marcus, H. Mao, N. Tatbul, T. Kraska, and M. Alizadeh. Cost-Guided Cardinality Estimation: Focus Where it Matters. In Workshop on Self-Managing Databases, SMDB @ ICDE '20, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  25. J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. Learning State Representations for Query Optimization with Deep Reinforcement Learning. In 2nd Workshop on Data Managmeent for End-to-End Machine Learning, DEEM '18, 2018.Google ScholarGoogle Scholar
  26. J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. An Empirical Analysis of Deep Learning for Cardinality Estimation. arXiv:1905.06425 [cs], Sept. 2019.Google ScholarGoogle Scholar
  27. Y. Park, S. Zhong, and B. Mozafari. QuickSel: Quick Selectivity Learning with Mixture Models. arXiv:1812.10568 [cs], Dec. 2018.Google ScholarGoogle Scholar
  28. A. Pavlo, E. P. C. Jones, and S. Zdonik. On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems. PVLDB, 5(2):86--96, 2011.Google ScholarGoogle Scholar
  29. A. G. Read. DeWitt clauses: Can we protect purchasers without hurting Microsoft. Rev. Litig., 25:387, 2006.Google ScholarGoogle Scholar
  30. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In J. Mylopolous and M. Brodie, editors, SIGMOD '79, SIGMOD '79, pages 511--522, San Francisco (CA), 1979. Morgan Kaufmann.Google ScholarGoogle Scholar
  31. Shrainik Jain, Jiaqi Yan, Thiery Cruanes, and Bill Howe. Database-Agnostic Workload Management. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.Google ScholarGoogle Scholar
  32. M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - DB2's LEarning Optimizer. In VLDB, VLDB '01, pages 19--28, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Sun and G. Li. An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 13(3):307--319, Nov. 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. W. R. Thompson. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika, 1933.Google ScholarGoogle Scholar
  35. I. Trummer, S. Moseley, D. Maram, S. Jo, and J. Antonakakis. SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning. PVLDB, 11(12):2074--2077, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Tzoumas, T. Sellis, and C. Jensen. A Reinforcement Learning Approach for Adaptive Query Processing. Technical Reports, June 2008.Google ScholarGoogle Scholar
  37. Z. Yang, A. Kamsetty, S. Luan, E. Liang, Y. Duan, X. Chen, and I. Stoica. NeuroCard: One Cardinality Estimator for All Tables. arXiv:2006.08109 [cs], June 2020.Google ScholarGoogle Scholar
  38. Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, X. Chen, P. Abbeel, J. M. Hellerstein, S. Krishnan, and I. Stoica. Deep unsupervised cardinality estimation. Proceedings of the VLDB Endowment, 13(3):279--292, Nov. 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Zhou. A Survey on Contextual Multi-armed Bandits. arXiv:1508.03326 [cs], Feb. 2016.Google ScholarGoogle Scholar

Index Terms

  1. Bao: Making Learned Query Optimization Practical
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader