ABSTRACT
Query optimization has long been fundamental for database systems. There are cracks in the edifice, however, as the complexity of modern query workloads outpace what database systems can manage well. Automatic tools are needed for database vendors, such as IBM with Db2, to help customers troubleshoot their performance problems, as manual troubleshooting is painstaking. To manage complex and large workloads, we develop a distributed system called dGALO that learns recurring problem patterns in query plans over workloads. dGALO employs these problem patterns to build a RDF-based, SPARQL-queried knowledge-base of plan-rewrite remedies. We illustrate a distributed implementation of dGALO on Apache Spark with efficient partitioning strategies for load balancing. The system employs additional pruning strategies via clustering, which yields a fine-grained trade off between runtime and accuracy. dGALO uses its knowledge-base to re-optimize queries, often to dramatic effect, and is a valuable tool for the development team to refine the optimizer with new techniques. We demonstrate by an experimental study over the TPC-DS benchmark the efficiency and effectiveness of our techniques.
Supplemental Material
- S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, V. Narasayya, and M. Syamala. 2005. Database tuning advisor for microsoft sql server 2005. In SIGMOD. 930--932.Google Scholar
- R. Barber, G. Lohman, I. Pandis, V. Raman, R. Sidle, G. Attaluri, N. Chainani, S. Lightstone, and D. Sharpe. 2014. Memory-efficient Hash Joins. PVLDB, Vol. 8, 4 (2014), 353--364.Google ScholarDigital Library
- N. Bruno, S. Chaudhuri, and R. Ramamurthy. 2009a. Interactive Plan Hints for Query Optimization. In SIGMOD. 1043--1046.Google Scholar
- N. Bruno, S. Chaudhuri, and R. Ramamurthy. 2009b. Power Hints for Query Optimization. In ICDE. 469--480.Google Scholar
- IBM Knowledge Center. 2021. IBM InfoSphere Optim Query Workload Tuner. https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.idm.tools.doc/doc/c0057033.htmlGoogle Scholar
- S. Chaudhuri and K. Shim. 1999. Optimization of Queries with User-defined Predicates. TODS, Vol. 24, 2 (1999), 177--228.Google ScholarDigital Library
- B. Dageville, D. Das, K. Dias, K. Yagoub, M. Zait, and M. Ziauddin. 2004. Automatic SQL Tuning in Oracle 10G. In VLDB. 1098--1109.Google Scholar
- G. Damasio, S. Bryson, V. Corvinelli, P. Godfrey, P. Mierzejewski, A. Mihaylov, J. Szlichta, and C. Zuzarte. 2019 a. GALO: Guided Automated Learning for re-Optimization. PVLDB, Vol. 12, 12 (2019), 1778--1781.Google ScholarDigital Library
- G. Damasio, V. Corvinelli, P. Godfrey, P. Mierzejewski, A. Mihaylov, J. Szlichta, and C. Zuzarte. 2019 b. Guided Automated Learning for query workload re-Optimization. PVLDB, Vol. 12, 12 (2019), 2010--2021.Google ScholarDigital Library
- G. Damasio, P. Mierzejewski, J. Szlichta, and C. Zuzarte. 2016a. OptImatch: Semantic web system for query problem determination. In ICDE. 1334--1337.Google Scholar
- G. Damasio, P. Mierzejewski, J. Szlichta, and C. Zuzarte. 2016b. Query Performance Problem Determination with Knowledge Base in Semantic Web System OptImatch. In EDBT. 515--526.Google Scholar
- J. Dean and S. Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM, Vol. 51, 1 (2008), 107--113.Google ScholarDigital Library
- A. El-Helw, I. F. Ilyas, and C. Zuzarte. 2009. Statadvisor: Recommending statistical views. PVLDB, Vol. 2, 2 (2009), 1306--1317.Google ScholarDigital Library
- M. Ester, H. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD. 226--231.Google Scholar
- J. Gryz, Q. Wang, X. Qian, and C. Zuzarte. 2008. SQL Queries with CASE Expressions. In ISMIS. 351--360.Google Scholar
- Z. He, B. S. Lee, and R. Snapp. 2005. Self-tuning Cost Modeling of User-defined Functions in an Object-relational DBMS. TODS, Vol. 30, 3 (2005), 812--853.Google ScholarDigital Library
- Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR.Google Scholar
- Henry Liu, Mingbin Xu, Ziting Yu, Vincent Corvinelli, and Calisto Zuzarte. 2015. Cardinality estimation using neural networks. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering. IBM Corp., 53--59.Google ScholarDigital Library
- R. Marcus and O. Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In aiDM. Article 3, 4 pages.Google Scholar
- V. Muntés-Mulero, J. Aguilar-Saborit, C. Zuzarte, and J. Larriba-Pey. 2006. CGO: A Sound Genetic Optimizer for Cyclic Query Graphs. In ICCS. 156--163.Google Scholar
- J. Ortiz, M. Balazinska, J. Gehrke, and S. Keerthi. 2018. Learning State Representations for Query Optimization with Deep Reinforcement Learning. In DEEM. 1--4.Google Scholar
- A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, and I. Quah. 2017. Self-Driving Database Management Systems. In CIDR.Google Scholar
- P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. 1979. Access path selection in a relational database management system. In SIGMOD. 23--34.Google Scholar
- J. Szlichta, P. Godfrey, L. Golab, M. Kargar, and D. Srivastava. 2017. Effective and Complete Discovery of Order Dependencies via Set-based Axiomatization. PVLDB, Vol. 10, 7 (2017), 721--732.Google ScholarDigital Library
- M. Ziauddin, D. Das, H. Su, Y. Zhu, and K. Yagoub. 2008. Optimizer Plan Change Management: Improved Stability and Performance in Oracle 11G. PVLDB, Vol. 1, 2 (2008), 1346--1355.Google ScholarDigital Library
- D. C. Zilio, J. Rao, S. Lightstone, G. Lohman, A. Storm, C. Garcia-Arellano, and S. Fadden. 2004 a. DB2 Design Advisor: Integrated Automatic Physical Database Design. In VLDB. 1087--1097.Google Scholar
- D. C. Zilio, C. Zuzarte, S. Lightstone, W. Ma, G. M. Lohman, R. J. Cochrane, H. Pirahesh, L. Colby, J. Gryz, E. Alton, and G. Valentin. 2004 b. Recommending materialized views and indexes with the IBM DB2 Design Advisor. In ICAC. 180--187.Google Scholar
Index Terms
- Scalable Learning to Troubleshoot Query Performance Problems
Recommendations
Query Optimization for Ontology-Mediated Query Answering
WWW '24: Proceedings of the ACM on Web Conference 2024Ontology-mediated query answering (OMQA) consists in asking database queries on knowledge bases (KBs); a KB is a set of facts called the KB's database, which is described by domain knowledge called the KB's ontology. A widely-investigated OMQA technique ...
Scalable Query Profiling Employing Purging and Elimination Technique
ICIBE '18: Proceedings of the 4th International Conference on Industrial and Business EngineeringReusing Queries contributes in speeding up the performance of database in responding to future queries as it can reduce the number of database queries to be processed and sent back to the user. Profiling a query in a machine who requested a query in ...
Query Folding
ICDE '96: Proceedings of the Twelfth International Conference on Data EngineeringQuery folding refers to the activity of determining if and how a query can be answered using a given set of resources, which might be materialized views, cached results of previous queries, or queries answerable by other databases. We investigate query ...
Comments