ABSTRACT
State-of-the-art index tuners rely on query optimizer's cost estimates to search for the index configuration with the largest estimated execution cost improvement`. Due to well-known limitations in optimizer's estimates, in a significant fraction of cases, an index estimated to improve a query's execution cost, e.g., CPU time, makes that worse when implemented. Such errors are a major impediment for automated indexing in production systems. We observe that comparing the execution cost of two plans of the same query corresponding to different index configurations is a key step during index tuning. Instead of using optimizer's estimates for such comparison, our key insight is that formulating it as a classification task in machine learning results in significantly higher accuracy. We present a study of the design space for this classification problem. We further show how to integrate this classifier into the state-of-the-art index tuners with minimal modifications, i.e., how artificial intelligence (AI) can benefit automated indexing (AI). Our evaluation using industry-standard benchmarks and a large number of real customer workloads demonstrates up to 5x reduction in the errors in identifying the cheaper plan in a pair, which eliminates almost all query execution cost regressions when the model is used in index tuning.
- Martín Abadi et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283. https://www.usenix.org/conference/ osdi16/technical-sessions/presentation/abadiGoogle Scholar
- Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollár, Arunprasad P. Marathe, Vivek R. Narasayya, and Manoj Syamala. 2004. Database Tuning Advisor for Microsoft SQL Server 2005. In VLDB. 1110--1121.Google Scholar
- Sanjay Agrawal, Eric Chu, and Vivek Narasayya. 2006. Automatic physical design tuning: workload as a sequence. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 683--694. Google ScholarDigital Library
- Sanjay Agrawal, Vivek R. Narasayya, and Beverly Yang. 2004. Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design. In SIGMOD. 359--370. Google ScholarDigital Library
- Mert Akdere, Ugur Çetintemel, Matteo Riondato, Eli Upfal, and Stanley B. Zdonik. 2012. Learning-based Query Performance Modeling and Prediction. In ICDE. IEEE Computer Society, Washington, DC, USA, 390--401. Google ScholarDigital Library
- Ioannis Alagiannis, Stratos Idreos, and Anastasia Ailamaki. 2014. H2O: a hands-free adaptive store. In SIGMOD. 1103--1114.Google Scholar
- Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads. In SIGMOD. 583--598.Google Scholar
- Azure SQL Database {n. d.}. Azure SQL Database. https://azure. microsoft.com/en-us/services/sql-database/.Google Scholar
- Christopher M. Bishop. 2007. Pattern recognition and machine learning, 5th Edition. Springer. http://www.worldcat.org/oclc/71008143Google Scholar
- Renata Borovica, Ioannis Alagiannis, and Anastasia Ailamaki. 2012. Automated physical designers: what you see is (not) what you get. In DBTest. 9.Google Scholar
- Kurt P. Brown, Michael J. Carey, and Miron Livny. 1996. Goal-oriented Buffer Management Revisited. In SIGMOD. ACM, New York, NY, USA, 353--364. Google ScholarDigital Library
- Nicolas Bruno and Surajit Chaudhuri. 2007. An Online Approach to Physical Design Tuning. In ICDE. 826--835.Google Scholar
- Nicolas Bruno and Surajit Chaudhuri. 2007. Physical design refinement: The 'merge-reduce' approach. ACM Trans. Database Syst. 32, 4 (2007), 28. Google ScholarDigital Library
- Surajit Chaudhuri. 1998. An Overview of Query Optimization in Relational Systems. In PODS. ACM, New York, NY, USA, 34--43. Google ScholarDigital Library
- Surajit Chaudhuri. 2009. Query optimizers: time to rethink the contract?. In SIGMOD. 961--968. Google ScholarDigital Library
- Surajit Chaudhuri, Vivek Narasayya, and Ravi Ramamurthy. 2008. A pay-as-you-go framework for query execution feedback. PVLDB 1, 1 (2008), 1141--1152. Google ScholarDigital Library
- Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost- Driven Index Selection Tool for Microsoft SQL Server. In VLDB. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 146--155. Google ScholarDigital Library
- Surajit Chaudhuri and Vivek R. Narasayya. 1998. AutoAdmin 'What-if' Index Analysis Utility. In SIGMOD. 367--378. Google ScholarDigital Library
- Surajit Chaudhuri and Vivek R. Narasayya. 2007. Self-Tuning Database Systems: A Decade of Progress. In VLDB. 3--14. Google ScholarDigital Library
- Chungmin Melvin Chen and Nick Roussopoulos. 1994. Adaptive Selectivity Estimation Using Query Feedback. In SIGMOD. ACM, New York, NY, USA, 161--172.Google Scholar
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In SIGKDD. 785--794. Google ScholarDigital Library
- Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 1--2 (2010), 48--57. Google ScholarDigital Library
- Benoît Dageville, Dinesh Das, Karl Dias, Khaled Yagoub, Mohamed Zaït, and Mohamed Ziauddin. 2004. Automatic SQL Tuning in Oracle 10g. In VLDB. 1098--1109. http://www.vldb.org/conf/2004/IND4P2.PDF Google ScholarDigital Library
- Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, and Surajit Chaudhuri. 2019. Automatically Indexing Millions of Databases in Microsoft Azure SQL Database. In SIGMOD. ACM.Google Scholar
- Debabrata Dash, Neoklis Polyzotis, and Anastasia Ailamaki. 2011. CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads. PVLDB 4, 6 (2011), 362--372.Google ScholarDigital Library
- Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2018. Plan Stitch: Harnessing the Best of Many Plans. PVLDB 11, 10 (2018), 1123--1136. Google ScholarDigital Library
- Jennie Duggan, Ugur Çetintemel, Olga Papaemmanouil, and Eli Upfal. 2011. Performance prediction for concurrent database workloads. In SIGMOD. 337--348. Google ScholarDigital Library
- Adam Dziedzic, Jingjing Wang, Sudipto Das, Bolin Ding, Vivek R. Narasayya, and Manoj Syamala. 2018. Columnstore and B+ tree - Are Hybrid Physical Designs Important?. In SIGMOD. 177--190.Google Scholar
- Kareem El Gebaly and Ashraf Aboulnaga. 2008. Robustness in automatic physical database design. In EDBT. 145--156. Google ScholarDigital Library
- Sheldon J. Finkelstein, Mario Schkolnick, and Paolo Tiberio. 1988. Physical Database Design for Relational Databases. ACM Trans. Database Syst. 13, 1 (1988), 91--128. Google ScholarDigital Library
- Archana Ganapathi, Harumi Kuno, Umeshwar Dayal, Janet L. Wiener, Armando Fox, Michael Jordan, and David Patterson. 2009. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In ICDE. IEEE Computer Society, Washington, DC, USA, 592--603. Google ScholarDigital Library
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org. Google ScholarDigital Library
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In SIGKDD. 855--864. Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In ICCV. 1026--1034. Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.Google Scholar
- Kurt Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 2 (1991), 251--257. Google ScholarDigital Library
- Kurt Hornik, Maxwell B. Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359--366. Google ScholarCross Ref
- Stratos Idreos, Martin L. Kersten, and Stefan Manegold. 2007. Database Cracking. In CIDR. 68--78.Google Scholar
- Stratos Idreos, Stefan Manegold, and Goetz Graefe. 2012. Adaptive indexing in modern database kernels. In EDBT. 566--569.Google Scholar
- Stratos Idreos, Stefan Manegold, Harumi A. Kuno, and Goetz Graefe. 2011. Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores. PVLDB 4, 9 (2011), 585--597.Google Scholar
- Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, and Demi Guo. 2018. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. In SIGMOD. 535--550.Google Scholar
- Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In NIPS. 3149--3157. http://papers.nips.cc/paper/ 6907-lightgbm-a-highly-efficient-gradient-boosting-decision-treeGoogle Scholar
- Keras 2018. Keras: The Python Deep Learning library. https://keras.io/.Google Scholar
- Michael S. Kester, Manos Athanassoulis, and Stratos Idreos. 2017. Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?. In SIGMOD. 715--730. Google ScholarDigital Library
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). arXiv:1412.6980 http://arxiv.org/abs/1412.6980Google Scholar
- Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018. 489--504. Google ScholarDigital Library
- Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (Nov. 2015), 204--215. Google ScholarDigital Library
- Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In ICDE. 38-- 49.Google Scholar
- Jiexing Li, Arnd Christian König, Vivek R. Narasayya, and Surajit Chaudhuri. 2012. Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. PVLDB 5, 11 (2012), 1555-- 1566. Google ScholarDigital Library
- Guy Lohman. 2014. Is Query Optimization a "Solved" Problem? http: //wp.sigmod.org/?p=1075.Google Scholar
- ML.NET 2018. Machine Learning for .NET. https://github.com/dotnet/ machinelearning.Google Scholar
- Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self- Driving Database Management Systems. In CIDR.Google Scholar
- Lorien Y. Pratt. 1992. Discriminability-Based Transfer between Neural Networks. In NIPS. 204--211. http://papers.nips.cc/paper/ 641-discriminability-based-transfer-between-neural-networks Google ScholarDigital Library
- Program for TPC-H Data Generation with Skew {n. d.}. Program for TPC-H Data Generation with Skew. https://www.microsoft.com/ en-us/download/details.aspx?id=52430.Google Scholar
- Jun Rao, Chun Zhang, Nimrod Megiddo, and Guy M. Lohman. 2002. Automating physical database design in a parallel database. In SIGMOD. 558--569. Google ScholarDigital Library
- Karl Schnaitter, Serge Abiteboul, Tova Milo, and Neoklis Polyzotis. 2006. COLT: continuous on-line tuning. In SIGMOD. 793--795. Google ScholarDigital Library
- scikit-learn 2018. scikit-learn: Machine Learning in Python. http: //scikit-learn.org/stable/.Google Scholar
- Ankur Sharma, Felix Martin Schuhknecht, and Jens Dittrich. 2018. The Case for Automatic Database Administration using Deep Reinforcement Learning. CoRR abs/1801.05643 (2018).Google Scholar
- Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Training Very Deep Networks. In NIPS. 2377--2385. http://papers.nips. cc/paper/5850-training-very-deep-networks Google ScholarDigital Library
- Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO - DB2's LEarning Optimizer. In VLDB. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 19--28. Google ScholarDigital Library
- Michael Stonebraker. 1974. The choice of partial inversions and combined indices. International Journal of Parallel Programming 3, 2 (1974), 167--188.Google Scholar
- Adam J Storm, Christian Garcia-Arellano, Sam S Lightstone, Yixin Diao, and Maheswaran Surendra. 2006. Adaptive self-tuning memory in DB2. In VLDB. VLDB Endowment, 1081--1092. Google ScholarDigital Library
- TPC Benchmark DS: Standard Specification v2.6.0. {n. d.}. TPC Benchmark DS: Standard Specification v2.6.0. http://www.tpc.org/tpcds/.Google Scholar
- TPC Benchmark H: Standard Specification v2.17.3. {n. d.}. TPC Benchmark H: Standard Specification v2.17.3. http://www.tpc.org/tpch/ default.asp.Google Scholar
- Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. 2000. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In ICDE. 101--110. Google ScholarDigital Library
- Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In SIGMOD. ACM, New York, NY, USA, 1009--1024. Google ScholarDigital Library
- Gerhard Weikum, Axel Moenkeberg, Christof Hasse, and Peter Zabback. 2002. Self-tuning database technology and information services: from wishful thinking to viable engineering. In VLDB. Elsevier, 20--31. Google ScholarDigital Library
- Wentao Wu, Yun Chi, Shenghuo Zhu, Jun'ichi Tatemura, Hakan Hacigümüs, and Jeffrey F. Naughton. 2013. Predicting Query Execution Time: Are Optimizer Cost Models Really Unusable?. In ICDE. IEEE Computer Society, Washington, DC, USA, 1081--1092. Google ScholarDigital Library
- Wentao Wu, Jeffrey F. Naughton, and Harneet Singh. 2016. Sampling- Based Query Re-Optimization. In SIGMOD. ACM, New York, NY, USA, 1721--1736. Google ScholarDigital Library
- Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In NIPS. 3320--3328. http://papers.nips.cc/paper/ 5347-how-transferable-are-features-in-deep-neural-networks Google ScholarDigital Library
- Haijun Zhang, Shuang Wang, Xiaofei Xu, Tommy W. S. Chow, and Q. M. Jonathan Wu. 2018. Tree2Vector: Learning a Vectorial Representation for Tree-Structured Data. IEEE Trans. Neural Netw. Learning Syst. 29, 11 (2018), 5304--5318.Google ScholarCross Ref
- Daniel C. Zilio, Jun Rao, Sam Lightstone, Guy Lohman, Adam Storm, Christian Garcia-Arellano, and Scott Fadden. 2004. DB2 Design Advisor: Integrated Automatic Physical Database Design. In VLDB. VLDB Endowment, 1087--1097. Google ScholarDigital Library
Index Terms
- AI Meets AI: Leveraging Query Executions to Improve Index Recommendations
Recommendations
Automatically Indexing Millions of Databases in Microsoft Azure SQL Database
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataAn appropriate set of indexes can result in orders of magnitude better query performance. Index management is a challenging task even for expert human administrators. Fully automating this process is of significant value. We describe the challenges, ...
An index selection method without repeated optimizer estimations
The index selection problem (ISP) concerns the selection of an appropriate index set to minimize the total cost for a given workload containing read and update queries. Since the ISP has been proven to be an NP-hard problem, most studies focus on ...
Learning an Index Advisor with Deep Reinforcement Learning
Web and Big DataAbstractIndexes are crucial for the efficient processing of database workloads and an appropriately selected set of indexes can drastically improve query processing performance. However, the selection of beneficial indexes is a non-trivial problem and ...
Comments