Tun-OCM: A model-driven approach to support database tuning decision making

https://doi.org/10.1016/j.dss.2021.113538Get rights and content

Highlights

  • The development of a DSS depends on issues that affect DB performance, which is addressed by DB tuning decisions.

  • DB tuning is essentially a configuration management (CM) task and a given configuration decision may impact others.

  • The Tun-Ocm model provides formal and platform-independent knowledge that fosters interoperability among different DBMS.

  • Tun-Ocm increases auditability and enables predictive analysis of the impact of tuning actions on existing structures.

  • An ontology pattern language for CM can be applied to any DSS domain where part of the purpose is a configuration problem.

Abstract

Database tuning is a task executed by Database Administrators (DBAs) based on their practical experience and on tuning systems, which support DBA actions towards improving the performance of a database system. It is notoriously a complex task that requires precise domain knowledge about possible database configurations. Ideally, a DBA should keep track of several Database Management Systems (DBMS) parameters, configure data structures, and must be aware about possible interferences among several database (DB) configurations. We claim that an automatic tuning system is a decision support system and DB tuning may also be seen as a configuration management task. Therefore, we may characterize it by means of a formal domain conceptualization, benefiting from existing control practices and computational support in the configuration management domain. This work presents Tun-OCM, a conceptual model represented as a well-founded ontology, that encompasses a novel characterization of the database tuning domain as a configuration management conceptualization to support decision making. We develop and represent Tun-OCM using the CM-OPL methodology and its underlying language. The benefits of Tun-OCM are discussed by instantiating it in a real scenario.

Introduction

Database tuning is a continuous process whose goal is to increase the performance of applications accessing a database. It is typically executed by a team of Database Administrators (DBAs). They are responsible for adjusting and experimenting with several combinations of parameters that impact the physical characteristics (e.g., buffer size) of a computational environment or even the amount of data transferred between disk and memory for each disk access or the number of checkpoint segments. They also perform several related tasks, such as deciding where data should be physically allocated across distributed platforms, creating and maintaining secondary data access structures. In addition, DBAs typically deal with a lack of standards at the physical database level due to the existence of a highly heterogeneous and competitive industry. There are specificities at each Database Management System (DBMS) implementation that impact on the physical design of the database and on the choice of adequate access structures to improve the performance of the information systems connected to a DBMS. Additionally, the DBA decision needs to be disseminated to the entire team, as each decision may impact the computational environment and, consequently, future decisions.

There are many heuristics proposed in the database literature to help DBAs improve the performance of DBMSs [[1], [2], [3], [4], [5], [6]]. However, their implementation typically requires commands or parameters that are native to particular DBMSs. Consequently, there is a demand for DBAs to acquire a large amount of specific knowledge, which impairs the interoperability among different DBMS and the management of data infrastructures composed by heterogeneous DBMS platforms. This scenario is increasingly common in modern organizations, especially in the case of data service providers, making the DB tuning task even more difficult.

It is crucial to establish a common conceptualization to address these issues among DBAs involving DBMSs elements and heuristics definitions, at a level that makes it independent of specific DBMSs. This common conceptualization also facilitates the understanding of each decision made, either by the team or by the tuning system throughout the tuning task. It can also enable semantic support for their automatic implementation in different DBMS platforms.

A complicating factor is that organizations are increasingly outsourcing their physical data storage to cloud environment providers. In this case, performance parameter adjustments need to be defined and controlled through service level agreements (SLAs). There is no control over some physical configurations, which further complicates the configuration management of the DBMS. Moreover, some parameters may influence and even preclude the use of others. For example, the PostgreSQL DBMS has two settings: shared_buffers and checkpoint_segments. The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL to use for caching data. The checkpoint_segments parameter is the maximum number of log file segments between automatic WAL (Write-Ahead Logging) checkpoints. PostgreSQL recommends for larger settings of shared buffers a corresponding increase in checkpoint_segments to spread out the process of writing vast quantities of new or changed data over a larger extended time [7].

It is thus necessary to construct a precise and extensible conceptual framework to support corporate data management strategies that establish quality criteria at the physical level. Its precision enables the translation of data quality constraints to the physical level as accurately as possible without loss of semantics. Its extensibility allows such limitations to be particularized for each operational environment (operating system, DBMS, memory architecture, distributed architecture) available in the organization. Each physical data management platform may be semantically integrated into a defined conceptual backbone. For this, a data model is used as an abstraction to provide a high-level representation of this universe of interest.

This paper proposes Tun-OCM, a model-driven approach to support database tuning decision-making adopting a configuration management perspective. To build Tun-OCM, we have used ontological analysis and an ontology pattern language (OPL) to support our conceptual model development strategy. OPLs have been largely used for constructing expressive models through the reuse of well-grounded and validated fragments (patterns), improving the models quality and speeding up the development process. We derive the proposed model using CM-OPL, which is a specific ontology pattern language defined for this task and applied to the configuration management domain. The proposal is evaluated through a proof of concept by instantiating the proposed model in an actual scenario, showing that the modeling process becomes more agile, precise, and with fewer ambiguities. In addition, Tun-OCM allows tracking each step taken during the tuning process until a decision is made, thus its impacts may be audited and assessed.

This paper is organized as follows. Section 2 characterizes the database self-tuning system as a decision support system and Section 3 discusses it from a configuration management perspective. Section 4 presents an overview of CM-OPL, our proposed OPL applied to the CM domain. Section 5 illustrates how CM-OPL was used to develop Tun-OCM, the proposed database tuning model that we instantiate for evaluation purposes. Section 6 discusses how the perspective of the CM domain can help the database tuning task. Finally, Section 7 concludes this paper.

Section snippets

The database self-tuning system as a decision support system

Stair and Reynolds [8] present a well-known model developed by Herbert Simon, in which decision-making is represented as a problem solving component (Fig. 1).

In the first stage of decision making (intelligence), potential problems or opportunities are identified and defined together with resource and environmental constraints. Alternative solutions to the problem are developed in the design stage. The choice stage requires selecting a course of action. In the next stage (implementation), a

Configuration management helps tuning decision-making

According to the ISO 10007:2017 standard, configuration is a set of physical and functional characteristics that describes a product at a given time. In the context of this standard, the term product should be interpreted as applicable to all product categories, e.g., documents, facilities, firmware, hardware, software, tools, materials, processes, services, and systems. Configuration management (CM) is used for documenting product or service configurations – the interrelated functional and

CM-OPL: A configuration management ontology pattern language

The CM-OPL facilitates the reuse of conceptual models in the configuration management domain, as well as other OPLs that value reuse and guide the construction of models from existing fragments.

This paper addresses the application of ontology patterns (OPs) in a particular domain. A domain-related ontology pattern (DROP) is a reusable fragment extracted from reference ontologies that captures part of the core knowledge related to a specific field (in the present paper, the configuration

The Tun-OCM model

CM-OPL provides a network of reusable patterns to foster the construction of high quality conceptual models that may be considered as domain ontologies. These patterns have been applied to build Tun-OCM, a well-founded conceptual model that encompasses a novel characterization of the database tuning domain as a configuration management conceptualization. This section presents the database tuning domain from the perspective of a CM task, and describes the proposed Tun-OCM model using the CM-OPL

Discussion

The instances of the Tun-OCM model precisely describe which changes were made to each check-in performed. For example, CheckIn01 creates three indexes (CIndex01, CIndex02, and CIndex03) and CheckIn02 creates a materialized view. Thus, it is possible to analyze the tuning actions that may impact existing structures. For example, if it is detected that after the last check-in, some old index is no longer used, it is possible to analyze which actions (indexes or materialized view) are acting on

Conclusion

OPL notoriously facilitates the reuse of models and ontologies fragments by providing a path towards (re)use of predefined patterns. We may use such patterns to improve correctness by adding models that are more precisely specified through ontologies. The configuration task is commonly present, in some sort, in many systems and may help in the database tuning decision-making. In this work, CM-OPL was applied to develop Tun-OCM, a database tuning conceptual model used by a tool to help DBAs

Acknowledgements

The authors thank FAPERJ, CAPES and CNPq for partially supporting this work.

Ana Carolina Almeida is an adjunct professor at the State University of Rio de Janeiro (UERJ), located in the Department of Informatics and a judicial analyst with a specialization in Informatics at the Federal Regional Court of the 2nd Region (TRF2), acting as DBA. Post-doctorate with an emphasis on Big Data at the Federal University of Rio de Janeiro (UFRJ) (Oct / 2014 to Apr / 2015). PhD in Informatics with a specialization in BD Tuning from the Pontifical Catholic University of Rio de

References (29)

  • Z. Ding et al.

    A software cybernetics approach to self-tuning performance of on-line transaction processing systems

    J. Syst. Softw.

    (2017)
  • F.B. Ruy et al.

    From reference ontologies to ontology patterns and back

    J. Data & Knowledge Eng.

    (2017)
  • A. Pavlo et al.

    External vs. internal: an essay on machine learning agents for autonomous database management systems

  • N.N. Noon et al.

    Automated performance tuning of data management systems with materializations and indices

    J. Comp. Commun.

    (2016)
  • A.C. Almeida et al.

    A Non-Intrusive approach for automated physical design tuning

  • E. Morelli et al.

    Autonomous re-indexing

  • L. Bellatreche et al.

    Bringing together partition-ing, materialized views and indexes to optimize performance of relational data warehouses

  • PostgreSQL - The PostgreSQL

    Global development group. PostgreSQL 9.1.24 documentation - Chapter 18 - server configuration - 18.4. resource consumption

    (2018)
  • R. Stair et al.

    Principles of Information Systems

    (2010)
  • R.K. Lahti

    Group Decision Making within the Organization: Can Models Help?

    (1999)
  • V. Lofti et al.

    Decision Support Systems for Operations Management & Management Science

    (1996)
  • R.G. Coyle

    Decision Analysis

    (1972)
  • B.D. Prescott

    Effective Decision-Making

    (1980)
  • ISO - International Organization for Standardization

    ISO 10007: Quality Management - Guidelines for Configuration Management

    (2017)
  • Cited by (4)

    • COMORP: Rapid prototyping for mathematical database cost models development

      2022, Journal of Computer Languages
      Citation Excerpt :

      The third category consists of languages that aim to leverage model-driven engineering for designing CMs. Similarly, with our work, the first initiatives database physical design, optimization, and tuning serve as a starting point for more comprehensive efforts covering aspects of (e.g. [3,15,39–42]). The work [40,41] propose an ontology to assist the DBA by keeping track of the physical design concepts and their instance related to the heuristic algorithm and evaluation module for each tuning scenario.

    • Automatic Database Knob Tuning: A Survey

      2023, IEEE Transactions on Knowledge and Data Engineering

    Ana Carolina Almeida is an adjunct professor at the State University of Rio de Janeiro (UERJ), located in the Department of Informatics and a judicial analyst with a specialization in Informatics at the Federal Regional Court of the 2nd Region (TRF2), acting as DBA. Post-doctorate with an emphasis on Big Data at the Federal University of Rio de Janeiro (UFRJ) (Oct / 2014 to Apr / 2015). PhD in Informatics with a specialization in BD Tuning from the Pontifical Catholic University of Rio de Janeiro (PUC-Rio) (2013). Master in Systems and Computing from the Military Institute of Engineering (IME / RJ) (2006). Researcher in the database area with an emphasis on: (i) DB (auto) tuning systems and (ii) ontologies. She was a consultant in database and taught courses at Petrobras University. Details in: http://lattes.cnpq.br/8306729029606464.

    Fernanda Baião is a professor at the Department of Industrial Engineering of the Pontifical Catholic University of Rio de Janeiro (PUC-Rio). She holds DSc (2001) and MSc (1997) degrees in Systems and Computer Engineering from COPPE/UFRJ, Brazil, and her research topics are in the areas of Data Science, Conceptual Modeling and Ontologies, Business Process Management (BPM) and Semantic Data Integration. Fernanda has authored more tan 150 peer-reviewed publications, participated and coordinated national and international research projects, and developed valuable expertise in knowledge transfer between the Academy and Industry in R&D projects on Data Science, BPM, Business Architecture, Data Management and Information Security, in the fields of Oil Exploration and Production and Gas, Insurance, IT Service Management and Fraud Prediction. Details in: http://lattes.cnpq.br/5068302552861597.

    Sérgio Lifschitz is an associate professor at the Department of Informatics of the Pontifical Catholic University of Rio de Janeiro (PUC-Rio). Holds a Doctor's degree in Informatics by École Nationale Supérieure des Télécommunications, ENST Paris, France (1994), a MSc (1990) and BSc (1986) in Electrical Engineering, both from PUC-Rio. His main research areas involve (i) autonomous computing and database systems and (ii) tools and database systems for bioinformatics applications. Details in: http://lattes.cnpq.br/8164403687403639.

    Daniel Schwabe a full professor (retired) at the Department of Informatics of the Pontifical Catholic University of Rio de Janeiro (PUC-Rio). He received a PhD in Computer Science from University Of California, Los Angeles (1981), MSc in Computer Science from Pontifical Catholic University of Rio de Janeiro (PUC-Rio) (1976) and BSc in Mathematics from Pontifical Catholic University of Rio de Janeiro (PUC-Rio) (1975). His research focuses on model driven design and implementation of Socio-Technical Systems, seen as men-machine teams that solve problems, with a focus on Knowledge Graphs and inclusion of human values. Details in: http://lattes.cnpq.br/5842794652841557.

    Maria Luiza M. Campos is a professor at the Department of Computer Science, Federal University of Rio de Janeiro (UFRJ). She coordinates the Knowledge Engineering Group (GRECO Group), supervising master and doctorate students at the Post-graduate Program in Informatics (PPGI/UFRJ) in the same university. She received a Ph.D. in Information Systems from the University of East Anglia (1993), M.SC. in Computer Engineering, at Federal University of Rio de Janeiro (1984) and Bachelor in Civil Engineering, at Federal University of Rio Grande do Sul (1978). The main area of interest is heterogeneous information integration, focusing on metadata, ontologies, conceptual modeling, data bases, data warehousing and semantic web. Main application domains include bioinformatics, e-gov, e-science and emergency systems. Details in: http://lattes.cnpq.br/0659658820912418.

    View full text