research-article

Scalable Learning to Troubleshoot Query Performance Problems

Authors:
Alexandar Mihaylov

Ontario Tech University & IBM CAS, Toronto, Canada

Ontario Tech University & IBM CAS, Toronto, Canada
View Profile

,
Vincent Corvinelli

IBM Ltd., Toronto, Canada

IBM Ltd., Toronto, Canada
View Profile

,
Parke Godfrey

York University & IBM CAS, Toronto, Canada

York University & IBM CAS, Toronto, Canada
View Profile

,
Piotr Mierzejewski

IBM Ltd., Toronto, Canada

IBM Ltd., Toronto, Canada
View Profile

,
Jaroslaw Szlichta

Ontario Tech University & IBM CAS, Toronto, Canada

Ontario Tech University & IBM CAS, Toronto, Canada
View Profile

,
Calisto Zuzarte

IBM Ltd., Toronto, Canada

IBM Ltd., Toronto, Canada
View Profile

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementOctober 2021Pages 4016–4025https://doi.org/10.1145/3459637.3481947

Published:30 October 2021Publication History

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 4016–4025

ABSTRACT

Query optimization has long been fundamental for database systems. There are cracks in the edifice, however, as the complexity of modern query workloads outpace what database systems can manage well. Automatic tools are needed for database vendors, such as IBM with Db2, to help customers troubleshoot their performance problems, as manual troubleshooting is painstaking. To manage complex and large workloads, we develop a distributed system called dGALO that learns recurring problem patterns in query plans over workloads. dGALO employs these problem patterns to build a RDF-based, SPARQL-queried knowledge-base of plan-rewrite remedies. We illustrate a distributed implementation of dGALO on Apache Spark with efficient partitioning strategies for load balancing. The system employs additional pruning strategies via clustering, which yields a fine-grained trade off between runtime and accuracy. dGALO uses its knowledge-base to re-optimize queries, often to dramatic effect, and is a valuable tool for the development team to refine the optimizer with new techniques. We demonstrate by an experimental study over the TPC-DS benchmark the efficiency and effectiveness of our techniques.

Supplemental Material

CIKM-2021-video-afp0562.mp4

mp4

19.8 MB

Download

References

S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, V. Narasayya, and M. Syamala. 2005. Database tuning advisor for microsoft sql server 2005. In SIGMOD. 930--932.Google Scholar
R. Barber, G. Lohman, I. Pandis, V. Raman, R. Sidle, G. Attaluri, N. Chainani, S. Lightstone, and D. Sharpe. 2014. Memory-efficient Hash Joins. PVLDB, Vol. 8, 4 (2014), 353--364.Google ScholarDigital Library
N. Bruno, S. Chaudhuri, and R. Ramamurthy. 2009a. Interactive Plan Hints for Query Optimization. In SIGMOD. 1043--1046.Google Scholar
N. Bruno, S. Chaudhuri, and R. Ramamurthy. 2009b. Power Hints for Query Optimization. In ICDE. 469--480.Google Scholar
IBM Knowledge Center. 2021. IBM InfoSphere Optim Query Workload Tuner. https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.idm.tools.doc/doc/c0057033.htmlGoogle Scholar
S. Chaudhuri and K. Shim. 1999. Optimization of Queries with User-defined Predicates. TODS, Vol. 24, 2 (1999), 177--228.Google ScholarDigital Library
B. Dageville, D. Das, K. Dias, K. Yagoub, M. Zait, and M. Ziauddin. 2004. Automatic SQL Tuning in Oracle 10G. In VLDB. 1098--1109.Google Scholar
G. Damasio, S. Bryson, V. Corvinelli, P. Godfrey, P. Mierzejewski, A. Mihaylov, J. Szlichta, and C. Zuzarte. 2019 a. GALO: Guided Automated Learning for re-Optimization. PVLDB, Vol. 12, 12 (2019), 1778--1781.Google ScholarDigital Library
G. Damasio, V. Corvinelli, P. Godfrey, P. Mierzejewski, A. Mihaylov, J. Szlichta, and C. Zuzarte. 2019 b. Guided Automated Learning for query workload re-Optimization. PVLDB, Vol. 12, 12 (2019), 2010--2021.Google ScholarDigital Library
G. Damasio, P. Mierzejewski, J. Szlichta, and C. Zuzarte. 2016a. OptImatch: Semantic web system for query problem determination. In ICDE. 1334--1337.Google Scholar
G. Damasio, P. Mierzejewski, J. Szlichta, and C. Zuzarte. 2016b. Query Performance Problem Determination with Knowledge Base in Semantic Web System OptImatch. In EDBT. 515--526.Google Scholar
J. Dean and S. Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM, Vol. 51, 1 (2008), 107--113.Google ScholarDigital Library
A. El-Helw, I. F. Ilyas, and C. Zuzarte. 2009. Statadvisor: Recommending statistical views. PVLDB, Vol. 2, 2 (2009), 1306--1317.Google ScholarDigital Library
M. Ester, H. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD. 226--231.Google Scholar
J. Gryz, Q. Wang, X. Qian, and C. Zuzarte. 2008. SQL Queries with CASE Expressions. In ISMIS. 351--360.Google Scholar
Z. He, B. S. Lee, and R. Snapp. 2005. Self-tuning Cost Modeling of User-defined Functions in an Object-relational DBMS. TODS, Vol. 30, 3 (2005), 812--853.Google ScholarDigital Library
Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR.Google Scholar
Henry Liu, Mingbin Xu, Ziting Yu, Vincent Corvinelli, and Calisto Zuzarte. 2015. Cardinality estimation using neural networks. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering. IBM Corp., 53--59.Google ScholarDigital Library
R. Marcus and O. Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In aiDM. Article 3, 4 pages.Google Scholar
V. Muntés-Mulero, J. Aguilar-Saborit, C. Zuzarte, and J. Larriba-Pey. 2006. CGO: A Sound Genetic Optimizer for Cyclic Query Graphs. In ICCS. 156--163.Google Scholar
J. Ortiz, M. Balazinska, J. Gehrke, and S. Keerthi. 2018. Learning State Representations for Query Optimization with Deep Reinforcement Learning. In DEEM. 1--4.Google Scholar
A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, and I. Quah. 2017. Self-Driving Database Management Systems. In CIDR.Google Scholar
P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. 1979. Access path selection in a relational database management system. In SIGMOD. 23--34.Google Scholar
J. Szlichta, P. Godfrey, L. Golab, M. Kargar, and D. Srivastava. 2017. Effective and Complete Discovery of Order Dependencies via Set-based Axiomatization. PVLDB, Vol. 10, 7 (2017), 721--732.Google ScholarDigital Library
M. Ziauddin, D. Das, H. Su, Y. Zhu, and K. Yagoub. 2008. Optimizer Plan Change Management: Improved Stability and Performance in Oracle 11G. PVLDB, Vol. 1, 2 (2008), 1346--1355.Google ScholarDigital Library
D. C. Zilio, J. Rao, S. Lightstone, G. Lohman, A. Storm, C. Garcia-Arellano, and S. Fadden. 2004 a. DB2 Design Advisor: Integrated Automatic Physical Database Design. In VLDB. 1087--1097.Google Scholar
D. C. Zilio, C. Zuzarte, S. Lightstone, W. Ma, G. M. Lohman, R. J. Cochrane, H. Pirahesh, L. Colby, J. Gryz, E. Alton, and G. Valentin. 2004 b. Recommending materialized views and indexes with the IBM DB2 Design Advisor. In ICAC. 180--187.Google Scholar

Index Terms

Scalable Learning to Troubleshoot Query Performance Problems
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Query optimization

Recommendations

Query Optimization for Ontology-Mediated Query Answering
WWW '24: Proceedings of the ACM on Web Conference 2024

Ontology-mediated query answering (OMQA) consists in asking database queries on knowledge bases (KBs); a KB is a set of facts called the KB's database, which is described by domain knowledge called the KB's ontology. A widely-investigated OMQA technique ...
Read More
Scalable Query Profiling Employing Purging and Elimination Technique
ICIBE '18: Proceedings of the 4th International Conference on Industrial and Business Engineering

Reusing Queries contributes in speeding up the performance of database in responding to future queries as it can reduce the number of database queries to be processed and sent back to the user. Profiling a query in a machine who requested a query in ...
Read More
Query Folding
ICDE '96: Proceedings of the Twelfth International Conference on Data Engineering

Query folding refers to the activity of determining if and how a query can be answered using a given set of resources, which might be materialized views, cached results of previous queries, or queries answerable by other databases. We investigate query ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed computing
query performance problems
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 120
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable Learning to Troubleshoot Query Performance Problems

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Query Optimization for Ontology-Mediated Query Answering

Scalable Query Profiling Employing Purging and Elimination Technique

Query Folding