skip to main content
10.1145/2806416.2806448acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Comprehensible Models for Reconfiguring Enterprise Relational Databases to Avoid Incidents

Published: 17 October 2015 Publication History

Abstract

Configuring enterprise database management systems is a notoriously hard problem. The combinatorial parameter space makes it intractable to run and observe the DBMS behavior in all scenarios. Thus, the database administrator has the difficult task of choosing DBMS configurations that potentially lead to critical incidents, thus hindering its availability or performance. We propose using machine learning to understand how configuring a DBMS can lead to such high risk incidents. We collect historical data from three IT environments that run both IBM DB2 and Oracle DBMS. Then, we implement several linear and non-linear multivariate models to identify and learn from high risk configurations. We analyze their performance, in terms of accuracy, cost, generalization and interpretability. Results show that high risk configurations can be identified with extremely high accuracy and that the database administrator can potentially benefit from the rules extracted to reconfigure in order to prevent incidents.

References

[1]
S. Agrawal, S. Chaudhuri, and V. Narasayya. Automated Selection of Materialized Views and Indexes in SQL Databases. VLDB'00, pages 496--505.
[2]
L. Breiman. Random Forests. Machine Learning, 2001.
[3]
N. Bruno and S. Chaudhuri. Automatic Physical Database Tuning: A Relaxation-Based Approach. SIGMOD'05, pages 227--238.
[4]
S. Chaudhuri and V. Narasayya. Self-Tuning Database Systems: A Decade of Progress. VLDB'07, pages 3--14.
[5]
J. Chung, D. Ferguson, and G. Wang. Goal Oriented Dyanmic Buffer Pool Management for Database Systems. IBM Research Report RC19807, pages 191--198.
[6]
S. Duan, V. Thummala, and S. Babu. Tuning Database Configuration Parameters with iTuned. PVLDB'09, 2(1), 1246--1257.
[7]
P. H. C. Eilers and B. D. Marx. Flexible Smoothing with B-splines and Penalties. Stat. Science, 11(2):89--121, 1996.
[8]
J. Friedman. Stochastic Gradient Boosting. Computational Statistics and Data Analysis, 38:367--378, 2002.
[9]
J. Friedman and B. Popescu. Predictive Learning via Rule Ensembles. The Annals of Applied Statistics, 12(2):916--954, 2008.
[10]
A. G. Ganek and T. A. Corbi. The Dawning of the Autonomic Computing Era. IBM Systems Journal, 42(1):5--18, 2003.
[11]
T. Hastie and R. Tibshirani. Generalized Additive Models. Chapman and Hall/CRC, 1990.
[12]
G. Hooker. Generalized Functional ANOVA Diagnostics for High-Dimensional Functions of Dependent Variables. Journal of Computational and Graphical Statistics, 16(3):709--732, 2007.
[13]
IBM DB2. www-01.ibm.com/software/data/db2/.
[14]
Y. Ioannidis and S. Christodoulakis. On the Propagation of Errors in the Size of Join Results. SIGMOD'91, pages 268--277.
[15]
S. Kwan, S. Lightstone, B. Schiefer, A. Storm, and L. Wu. Automatic Configuration for IBM DB2 Universal Database. IBM Performance Technical Report, 2002.
[16]
W. Loh. Regression Trees with Unbiased Variable Selection and Interaction Detection. Statistica Sinica, pages 361--386, 2002.
[17]
Y. Lou, R. Caruana, and J. Gehrke. Intelligible Models for Classification and Regression. KDD'12, pages 150--158.
[18]
Y. Lou, R. Caruana, J. Gehrke, and G. Hooker. Accurate Intelligible Models with Pairwise Interactions. KDD'13, pages 623--631.
[19]
Machine Learning Tool Kit. github.com/yinlou/mltk.
[20]
J. S. Oh and S. H. Lee. Resource Selection for Autonomic Database Tuning. ICDE'05, pages 1218--1225.
[21]
Oracle. www.oracle.com/us/products/database/.
[22]
J. Snoek, H. Larochelle, and R. Adams. Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems, 2012.
[23]
D. Sorokina, R. Caruana, M. Riedewald, and D. Fink. Detecting Statistical Interactions with Additive Groves of Trees. ICML'08, pages 1000--1007.
[24]
A. J. Storm, C. Garcia-Arellano, S. Lightstone, Y. Diao, and M. Surendra. Adaptive Self-tuning Memory in DB2. VLDB'06, pages 1081--1092.
[25]
G. Valentin, M. Zuliani, D. Zilio, G. Lohman, and A. Skelley. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. ICDE'00, pages 101--110.
[26]
G. Weikum, A. Moenkeberg, C. Hasse, and P. Zabback. Self-tuning Database Technology and Information Services: From Wishful Thinking to Viable Engineering. VLDB'02, pages 20--31.
[27]
D. Wiese and G. Rabinovitch. Knowledge Management in Autonomic Database Performance Tuning. ICAS'09, pages 129--134.
[28]
S. Wood. Thin Plate Regression Splines. Journal of the Royal Statistical Society, 65(1):95--114, 2003.
[29]
S. Wood. Generalized Additive Models: An Introduction with R. CRC Press, 2006.

Cited By

View all
  • (2024)Deep variability modeling to enhance reproducibility of database performance testingCluster Computing10.1007/s10586-024-04533-027:8(11683-11708)Online publication date: 2-Jun-2024
  • (2021)Additive Explanations for Student Fails Detected from Course Prerequisites2021 International Conference of Women in Data Science at Taif University (WiDSTaif )10.1109/WiDSTaif52235.2021.9430238(1-7)Online publication date: 30-Mar-2021
  • (2021)DeepCM: Deep neural networks to improve accuracy prediction of database cost modelsConcurrency and Computation: Practice and Experience10.1002/cpe.672434:10Online publication date: 8-Dec-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
October 2015
1998 pages
ISBN:9781450337946
DOI:10.1145/2806416
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. database configuration
  2. multivariate analysis

Qualifiers

  • Research-article

Conference

CIKM'15
Sponsor:

Acceptance Rates

CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Deep variability modeling to enhance reproducibility of database performance testingCluster Computing10.1007/s10586-024-04533-027:8(11683-11708)Online publication date: 2-Jun-2024
  • (2021)Additive Explanations for Student Fails Detected from Course Prerequisites2021 International Conference of Women in Data Science at Taif University (WiDSTaif )10.1109/WiDSTaif52235.2021.9430238(1-7)Online publication date: 30-Mar-2021
  • (2021)DeepCM: Deep neural networks to improve accuracy prediction of database cost modelsConcurrency and Computation: Practice and Experience10.1002/cpe.672434:10Online publication date: 8-Dec-2021
  • (2018)Spatial–Temporal Prediction Models for Active Ticket Managing in Data CentersIEEE Transactions on Network and Service Management10.1109/TNSM.2018.279440915:1(39-52)Online publication date: Mar-2018
  • (2017)Predicting DRAM reliability in the field with machine learningProceedings of the 18th ACM/IFIP/USENIX Middleware Conference: Industrial Track10.1145/3154448.3154451(15-21)Online publication date: 11-Dec-2017
  • (2016)Managing Data Center Tickets: Prediction and Active Sizing2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2016.38(335-346)Online publication date: Jun-2016
  • (2016)A Meta-advisor Repository for Database Physical DesignModel and Data Engineering10.1007/978-3-319-45547-1_6(72-87)Online publication date: 7-Sep-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media