A cost-based buffer replacement algorithm for object-oriented database systems

Comunicated by Ahmed Elmagarmid
https://doi.org/10.1016/S0020-0255(01)00116-5Get rights and content

Abstract

Many object-oriented database systems manage object buffers to provide fast access to objects. Traditional buffer replacement algorithms based on fixed-length pages simply assume that the cost incurred by operating a buffer is proportional to the number of buffer faults. However, this assumption no longer holds in an object buffer where objects are of variable-lengths and the cost of replacing an object varies for each object.

In this paper, we propose a cost-based replacement algorithm for object buffers. The proposed algorithm replaces the objects that have minimum costs per unit time and unit space. The cost model extends the previous page-based one to include the replacement costs and the sizes of objects. The performance tests show that the proposed algorithm is almost always superior to the LRU-2 algorithm and, when significant replacement cost is involved, is more than twice as fast.

Introduction

Many traditional database management systems (DBMSs) manage their buffers in physical units called pages. In these systems, all fetches and replacements are done in units of pages. This buffer management scheme is called page-based buffering [10]. On the other hand, most object-oriented DBMSs (OODBMSs) manage their buffers in two separate areas called an object buffer and a page buffer [10], [11]. The object buffer manages objects as logical units; the page buffer manages the physical pages containing the objects in the disk storage format. The objects residing in the object buffer are copied from the page buffer and are forced back to the page buffer when they are replaced. Actual disk operations are performed in the page buffer. This scheme is called dual buffering [10], [11], [19].

Many replacement algorithms have been proposed for page-based buffering. Typical algorithms are LRU [6], CLOCK [8], and LRU-k [13]. Replacement is the process of choosing a page or pages that will be removed from the buffer and written back to the disk in order to make space available for new pages [6]. A common strategy of replacement algorithms is to select a page that is least expected to be referenced. LRU estimates the time of the next reference to a page using the time of the last reference to that page [6]. LRU-k, a generalization of LRU, uses the time of the last kth reference [13]. CLOCK is a simple approximation of LRU, where approximation is done by using reference bits [8].

However, there has been little research in the area of object buffer management in dual buffering. Many OODBMSs rely on naive algorithms or simply apply techniques used in page-based algorithms. ORION [11] and UniSQL [18] use a garbage collection technique that replaces all unused objects in the object buffer. This technique runs potential risk of replacing objects that are likely to be referenced again. Versant [17] and GemStone [4] use LRU for replacing objects in the object buffer.

There are two characteristics that differentiate an object buffer from a page buffer. In contrast to the case of the page buffer, where pages are of a fixed length and the cost of replacing a page is constant, objects in the object buffer are of variable lengths and the cost of replacing an object varies depending on the object. In this paper, we define the replacement cost as the cost incurred by replacing an object. We note that the replacement cost does not include the cost of fetching the new object. The fetching cost is accounted for separately. We assume CPU computation time is negligible compared with disk access time and define the cost as the number of disk accesses.

These characteristics of the object buffer have been overlooked by conventional replacement algorithms. Nevertheless, they affect the performance of buffer management significantly. First, the length of an object being replaced determines the length of available space to accommodate the object being fetched. Hence, if an object being replaced is too small to allow the placement of the object being fetched, more objects need to be replaced thus degrading the performance of buffer management. Second, the replacement costs of objects vary depending on whether the objects have been updated or not. Especially, if an object's attribute on which an index is defined is modified, the cost of updating the index is included in the replacement cost [14], [20].

In a relational DBMS, every update to an indexed column of a tuple is followed by an update to the index. However, to avoid repeated index updates, an OODBMS can defer the index update until the updated object is replaced to the page buffer. Thus, updating an object would merely update the copy residing in the object buffer unless the object is replaced.

In this paper, we propose a cost-based replacement algorithm for object buffers. The cost model generalizes the model of previous page-based algorithms such as LRU and LRU-k to take into account the variable lengths of objects and replacement costs. The new algorithm replaces the object that has the minimum cost per unit time and unit space.

The idea of cost-based replacement algorithm has also been introduced in other areas. Sinnwell and Weikum have proposed a cost-based algorithm for distributed caching in a Networks of Workstations (NOW) environment, which takes into account the cost of fetching objects and their variable lengths to minimize network delay [15]. Aggarwal et al. have introduced a cost-based replacement algorithm for the web cache[1]; Scheuermann et al. one for the data warehouse[16]. Nevertheless, their approach do not account for the replacement cost, which is a major factor in buffering in OODBMSs. The idea proposed in this paper is the first cost-based algorithm for OODBMSs, which takes into account not only the cost of fetching objects and variable lengths but also the replacement costs, and has been developed independently of other algorithms.1

The paper is organized as follows. Section 2 briefly reviews the underlying concepts of replacement algorithms. Section 3 proposes our cost-based replacement algorithm. Section 4 evaluates the performance of the proposed algorithm. Finally, Section 5 summarizes the contributions and concludes the paper.

Section snippets

Basic concepts of replacement algorithms

In this section, we briefly overview the basic concepts of conventional page-based algorithms. We first define some terminology. Assuming N=1,…,n is a set of disk pages in the database, a reference string [6], [13] is a sequence of references to these pages and is described as w=r1,r2,…,rt,…, where rtN. The statement rt=x means that page x is referenced at the tth reference. Regarding t as a point in a discrete time domain, we can say that page x is referenced at time t.

A buffer fault occurs

The concept

In this section, we propose an object buffer replacement algorithm that extends the DCS algorithm to incorporate the sizes of objects and their replacement costs. We denote the cost functions Cfetch(x) and Crep(x) as the costs of fetching and replacing the object x. The function size(x) represents the size of the object x. The cost functions are determined by the operations executed on the object, by the availability of indexes defined on the attributes of the object, and by the low-level

Performance evaluation

In this section, we compare the performance of Cobra, DCS, Cobra-2, and LRU-2. Cobra and DCS use the static reference probabilities of objects; Cobra-2 and LRU-2 use the dynamically changing information about the last two references to objects to estimate the expected forward distance of objects. Both Cobra and Cobra-2 use the cost functions described in Appendix A to estimate the unit costs of objects. We obtain the static reference probabilities of objects for Cobra and DCS by counting the

Conclusions

In this paper, we have proposed a cost-based replacement algorithm, Cobra, for object buffers. The cost-based model extends Denning, Chen, and Shedler's model [7], which provides a theoretical basis for the LRU-k algorithm, to incorporate replacement and fetch costs. Based on the model, Cobra replaces an object that has the minimum unit cost.

Performance results show that Cobra-2 outperforms LRU-2. In read-only traversals, Cobra-2 has an advantage since it takes into account the sizes of

References (20)

  • C. Aggarwal et al.

    Caching on the world wide web

    IEEE Trans. Knowledge Data Eng.

    (1999)
  • A.V. Aho et al.

    Principles of optimal page replacement

    J. ACM

    (1971)
  • L.A. Belady

    A study of replacement algorithms for virtual storage computers

    IBM Syst. J.

    (1966)
  • P. Butterworth et al.

    The gemstone object database management system

    Comm. ACM

    (1991)
  • M.J. Carey, D.J. DeWitt, J.F. Naughton, The OO7 benchmark, in: Proceedings of the International Conference on...
  • E.G. Coffman et al.

    Operating Systems Theory

    (1973)
  • P.J. Denning, Y.C. Chen, G.S. Shedler, A model for program behavior under demand paging, Research Report RC-2301, IBM...
  • W. Effelsberg et al.

    Principles of database buffer management

    ACM Trans. Database Syst.

    (1984)
  • T. Johnson, D. Shasha, 2Q: A low overhead high performance buffer management replacement algorithm, in: Proceedings of...
  • A. Kemper, D. Kossmann, Dual-buffering strategies in object bases, in: Proceedings of the International Conference on...
There are more references available in the full text version of this article.

Cited by (5)

  • Hair-oriented data model for spatio-temporal data representation

    2016, Expert Systems with Applications
    Citation Excerpt :

    A newer object-oriented data model in the spatio-temporal field was proposed in 2011 for mining monitor-based data (Yang-Ming & Qin-Lin, 2011). This presentation is suitable for an object-oriented programming environment, and the data models in this category are also able to cover complex tasks with a corresponding increase in the costs of time and space (Park, Whang, Lee, & Song, 2001). Object-relational modeling is the third category, and this approach is designed based on both features of the two previous categories.

  • A hierarchical model for test-cost-sensitive decision systems

    2009, Information Sciences
    Citation Excerpt :

    The problem of cost-sensitive learning arises frequently in medical diagnosis [16], machine fault diagnosis, automated testing of electronic equipment or robots [24], buffer replacement in database systems [17], Internet-based distributed systems [1], and many others [27].

  • Hair-oriented data model for spatio-temporal data mining

    2015, International Review on Computers and Software

This work was supported by the Korea Science and Engineering Foundation (KOSEF) through the Advanced Information Technology Research Center (AITrc).

View full text