skip to main content
10.1145/3299869.3324957acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

AI Meets AI: Leveraging Query Executions to Improve Index Recommendations

Published:25 June 2019Publication History

ABSTRACT

State-of-the-art index tuners rely on query optimizer's cost estimates to search for the index configuration with the largest estimated execution cost improvement`. Due to well-known limitations in optimizer's estimates, in a significant fraction of cases, an index estimated to improve a query's execution cost, e.g., CPU time, makes that worse when implemented. Such errors are a major impediment for automated indexing in production systems. We observe that comparing the execution cost of two plans of the same query corresponding to different index configurations is a key step during index tuning. Instead of using optimizer's estimates for such comparison, our key insight is that formulating it as a classification task in machine learning results in significantly higher accuracy. We present a study of the design space for this classification problem. We further show how to integrate this classifier into the state-of-the-art index tuners with minimal modifications, i.e., how artificial intelligence (AI) can benefit automated indexing (AI). Our evaluation using industry-standard benchmarks and a large number of real customer workloads demonstrates up to 5x reduction in the errors in identifying the cheaper plan in a pair, which eliminates almost all query execution cost regressions when the model is used in index tuning.

References

  1. Martín Abadi et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283. https://www.usenix.org/conference/ osdi16/technical-sessions/presentation/abadiGoogle ScholarGoogle Scholar
  2. Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollár, Arunprasad P. Marathe, Vivek R. Narasayya, and Manoj Syamala. 2004. Database Tuning Advisor for Microsoft SQL Server 2005. In VLDB. 1110--1121.Google ScholarGoogle Scholar
  3. Sanjay Agrawal, Eric Chu, and Vivek Narasayya. 2006. Automatic physical design tuning: workload as a sequence. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 683--694. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sanjay Agrawal, Vivek R. Narasayya, and Beverly Yang. 2004. Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design. In SIGMOD. 359--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mert Akdere, Ugur Çetintemel, Matteo Riondato, Eli Upfal, and Stanley B. Zdonik. 2012. Learning-based Query Performance Modeling and Prediction. In ICDE. IEEE Computer Society, Washington, DC, USA, 390--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ioannis Alagiannis, Stratos Idreos, and Anastasia Ailamaki. 2014. H2O: a hands-free adaptive store. In SIGMOD. 1103--1114.Google ScholarGoogle Scholar
  7. Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads. In SIGMOD. 583--598.Google ScholarGoogle Scholar
  8. Azure SQL Database {n. d.}. Azure SQL Database. https://azure. microsoft.com/en-us/services/sql-database/.Google ScholarGoogle Scholar
  9. Christopher M. Bishop. 2007. Pattern recognition and machine learning, 5th Edition. Springer. http://www.worldcat.org/oclc/71008143Google ScholarGoogle Scholar
  10. Renata Borovica, Ioannis Alagiannis, and Anastasia Ailamaki. 2012. Automated physical designers: what you see is (not) what you get. In DBTest. 9.Google ScholarGoogle Scholar
  11. Kurt P. Brown, Michael J. Carey, and Miron Livny. 1996. Goal-oriented Buffer Management Revisited. In SIGMOD. ACM, New York, NY, USA, 353--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nicolas Bruno and Surajit Chaudhuri. 2007. An Online Approach to Physical Design Tuning. In ICDE. 826--835.Google ScholarGoogle Scholar
  13. Nicolas Bruno and Surajit Chaudhuri. 2007. Physical design refinement: The 'merge-reduce' approach. ACM Trans. Database Syst. 32, 4 (2007), 28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Surajit Chaudhuri. 1998. An Overview of Query Optimization in Relational Systems. In PODS. ACM, New York, NY, USA, 34--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Surajit Chaudhuri. 2009. Query optimizers: time to rethink the contract?. In SIGMOD. 961--968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Surajit Chaudhuri, Vivek Narasayya, and Ravi Ramamurthy. 2008. A pay-as-you-go framework for query execution feedback. PVLDB 1, 1 (2008), 1141--1152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost- Driven Index Selection Tool for Microsoft SQL Server. In VLDB. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 146--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Surajit Chaudhuri and Vivek R. Narasayya. 1998. AutoAdmin 'What-if' Index Analysis Utility. In SIGMOD. 367--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Surajit Chaudhuri and Vivek R. Narasayya. 2007. Self-Tuning Database Systems: A Decade of Progress. In VLDB. 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chungmin Melvin Chen and Nick Roussopoulos. 1994. Adaptive Selectivity Estimation Using Query Feedback. In SIGMOD. ACM, New York, NY, USA, 161--172.Google ScholarGoogle Scholar
  21. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In SIGKDD. 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 1--2 (2010), 48--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Benoît Dageville, Dinesh Das, Karl Dias, Khaled Yagoub, Mohamed Zaït, and Mohamed Ziauddin. 2004. Automatic SQL Tuning in Oracle 10g. In VLDB. 1098--1109. http://www.vldb.org/conf/2004/IND4P2.PDF Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, and Surajit Chaudhuri. 2019. Automatically Indexing Millions of Databases in Microsoft Azure SQL Database. In SIGMOD. ACM.Google ScholarGoogle Scholar
  25. Debabrata Dash, Neoklis Polyzotis, and Anastasia Ailamaki. 2011. CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads. PVLDB 4, 6 (2011), 362--372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2018. Plan Stitch: Harnessing the Best of Many Plans. PVLDB 11, 10 (2018), 1123--1136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jennie Duggan, Ugur Çetintemel, Olga Papaemmanouil, and Eli Upfal. 2011. Performance prediction for concurrent database workloads. In SIGMOD. 337--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Adam Dziedzic, Jingjing Wang, Sudipto Das, Bolin Ding, Vivek R. Narasayya, and Manoj Syamala. 2018. Columnstore and B+ tree - Are Hybrid Physical Designs Important?. In SIGMOD. 177--190.Google ScholarGoogle Scholar
  29. Kareem El Gebaly and Ashraf Aboulnaga. 2008. Robustness in automatic physical database design. In EDBT. 145--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sheldon J. Finkelstein, Mario Schkolnick, and Paolo Tiberio. 1988. Physical Database Design for Relational Databases. ACM Trans. Database Syst. 13, 1 (1988), 91--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Archana Ganapathi, Harumi Kuno, Umeshwar Dayal, Janet L. Wiener, Armando Fox, Michael Jordan, and David Patterson. 2009. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In ICDE. IEEE Computer Society, Washington, DC, USA, 592--603. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In SIGKDD. 855--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In ICCV. 1026--1034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.Google ScholarGoogle Scholar
  36. Kurt Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 2 (1991), 251--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kurt Hornik, Maxwell B. Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359--366. Google ScholarGoogle ScholarCross RefCross Ref
  38. Stratos Idreos, Martin L. Kersten, and Stefan Manegold. 2007. Database Cracking. In CIDR. 68--78.Google ScholarGoogle Scholar
  39. Stratos Idreos, Stefan Manegold, and Goetz Graefe. 2012. Adaptive indexing in modern database kernels. In EDBT. 566--569.Google ScholarGoogle Scholar
  40. Stratos Idreos, Stefan Manegold, Harumi A. Kuno, and Goetz Graefe. 2011. Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores. PVLDB 4, 9 (2011), 585--597.Google ScholarGoogle Scholar
  41. Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, and Demi Guo. 2018. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. In SIGMOD. 535--550.Google ScholarGoogle Scholar
  42. Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In NIPS. 3149--3157. http://papers.nips.cc/paper/ 6907-lightgbm-a-highly-efficient-gradient-boosting-decision-treeGoogle ScholarGoogle Scholar
  43. Keras 2018. Keras: The Python Deep Learning library. https://keras.io/.Google ScholarGoogle Scholar
  44. Michael S. Kester, Manos Athanassoulis, and Stratos Idreos. 2017. Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?. In SIGMOD. 715--730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). arXiv:1412.6980 http://arxiv.org/abs/1412.6980Google ScholarGoogle Scholar
  46. Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018. 489--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (Nov. 2015), 204--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In ICDE. 38-- 49.Google ScholarGoogle Scholar
  49. Jiexing Li, Arnd Christian König, Vivek R. Narasayya, and Surajit Chaudhuri. 2012. Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. PVLDB 5, 11 (2012), 1555-- 1566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Guy Lohman. 2014. Is Query Optimization a "Solved" Problem? http: //wp.sigmod.org/?p=1075.Google ScholarGoogle Scholar
  51. ML.NET 2018. Machine Learning for .NET. https://github.com/dotnet/ machinelearning.Google ScholarGoogle Scholar
  52. Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self- Driving Database Management Systems. In CIDR.Google ScholarGoogle Scholar
  53. Lorien Y. Pratt. 1992. Discriminability-Based Transfer between Neural Networks. In NIPS. 204--211. http://papers.nips.cc/paper/ 641-discriminability-based-transfer-between-neural-networks Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Program for TPC-H Data Generation with Skew {n. d.}. Program for TPC-H Data Generation with Skew. https://www.microsoft.com/ en-us/download/details.aspx?id=52430.Google ScholarGoogle Scholar
  55. Jun Rao, Chun Zhang, Nimrod Megiddo, and Guy M. Lohman. 2002. Automating physical database design in a parallel database. In SIGMOD. 558--569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Karl Schnaitter, Serge Abiteboul, Tova Milo, and Neoklis Polyzotis. 2006. COLT: continuous on-line tuning. In SIGMOD. 793--795. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. scikit-learn 2018. scikit-learn: Machine Learning in Python. http: //scikit-learn.org/stable/.Google ScholarGoogle Scholar
  58. Ankur Sharma, Felix Martin Schuhknecht, and Jens Dittrich. 2018. The Case for Automatic Database Administration using Deep Reinforcement Learning. CoRR abs/1801.05643 (2018).Google ScholarGoogle Scholar
  59. Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Training Very Deep Networks. In NIPS. 2377--2385. http://papers.nips. cc/paper/5850-training-very-deep-networks Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO - DB2's LEarning Optimizer. In VLDB. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 19--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Michael Stonebraker. 1974. The choice of partial inversions and combined indices. International Journal of Parallel Programming 3, 2 (1974), 167--188.Google ScholarGoogle Scholar
  62. Adam J Storm, Christian Garcia-Arellano, Sam S Lightstone, Yixin Diao, and Maheswaran Surendra. 2006. Adaptive self-tuning memory in DB2. In VLDB. VLDB Endowment, 1081--1092. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. TPC Benchmark DS: Standard Specification v2.6.0. {n. d.}. TPC Benchmark DS: Standard Specification v2.6.0. http://www.tpc.org/tpcds/.Google ScholarGoogle Scholar
  64. TPC Benchmark H: Standard Specification v2.17.3. {n. d.}. TPC Benchmark H: Standard Specification v2.17.3. http://www.tpc.org/tpch/ default.asp.Google ScholarGoogle Scholar
  65. Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. 2000. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In ICDE. 101--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In SIGMOD. ACM, New York, NY, USA, 1009--1024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Gerhard Weikum, Axel Moenkeberg, Christof Hasse, and Peter Zabback. 2002. Self-tuning database technology and information services: from wishful thinking to viable engineering. In VLDB. Elsevier, 20--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Wentao Wu, Yun Chi, Shenghuo Zhu, Jun'ichi Tatemura, Hakan Hacigümüs, and Jeffrey F. Naughton. 2013. Predicting Query Execution Time: Are Optimizer Cost Models Really Unusable?. In ICDE. IEEE Computer Society, Washington, DC, USA, 1081--1092. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Wentao Wu, Jeffrey F. Naughton, and Harneet Singh. 2016. Sampling- Based Query Re-Optimization. In SIGMOD. ACM, New York, NY, USA, 1721--1736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In NIPS. 3320--3328. http://papers.nips.cc/paper/ 5347-how-transferable-are-features-in-deep-neural-networks Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Haijun Zhang, Shuang Wang, Xiaofei Xu, Tommy W. S. Chow, and Q. M. Jonathan Wu. 2018. Tree2Vector: Learning a Vectorial Representation for Tree-Structured Data. IEEE Trans. Neural Netw. Learning Syst. 29, 11 (2018), 5304--5318.Google ScholarGoogle ScholarCross RefCross Ref
  72. Daniel C. Zilio, Jun Rao, Sam Lightstone, Guy Lohman, Adam Storm, Christian Garcia-Arellano, and Scott Fadden. 2004. DB2 Design Advisor: Integrated Automatic Physical Database Design. In VLDB. VLDB Endowment, 1087--1097. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
          June 2019
          2106 pages
          ISBN:9781450356435
          DOI:10.1145/3299869

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 June 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGMOD '19 Paper Acceptance Rate88of430submissions,20%Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader