Efficient handling of tuples with embedded large objects

https://doi.org/10.1016/S0169-023X(99)00040-3Get rights and content

Abstract

Modern database systems and storage manager toolkits usually provide a large object abstraction. Very often large objects are not used as standalone entities, but rather embedded within an aggregate of different types, i.e. a tuple. Depending on the large object's size and access probability, query performance is determined by the representation of the large object: either inlined within the aggregate or swapped out to a separate object. This paper describes a sound and general large object interface extension which automatically switches the representation of large objects according to their actual size. The optimum threshold size for switching the large object's representation is determined, based upon a linear cost model. Furthermore, a SHORE-based implementation and its performance are presented. It turns out that switching the representation of large objects yields great performance improvements for objects whose size is varying from quite small to large.

Introduction

Conventional relational database systems cannot meet the requirements from modern database application domains like GIS, CAD, spatial and spatio-temporal information systems, or the large and still growing field of multimedia processing. As a result of the great amount of research effort that has been done in order to overcome the limitations of relational database systems, new data models and systems implementing these models have arised, e.g. object-relational and object-oriented models and systems.

A common challenge for almost all new systems is the need to handle large objects, whose representations on secondary storage are exceeding a single disk page. Fortunately, building systems for advanced applications are supported by the storage manager components of toolkits like EXODUS [6] or SHORE [5], which provide all the typical DBMS features like transaction management, multiuser access control, and logging and recovery. These storage managers have proven to be reliable and efficient in handling standard data types as well as large objects.

Another approach to building systems for new application domains is extending an existing database system frame – usually tightly bound to a specific data model – by new abstract data types which can be used in the same way as standard data types. Access methods for new types have to be provided by the data type implementor using a well-defined application programming interface (API) to handle persistent data. This interface usually includes methods for access to large objects. Examples of such extensible systems are the commercially available Informix Universal Server or the SHORE-based PREDATOR system [14].

Work on large objects started with System R [1], supporting long fields of up to 32 Kbytes. Partial access to long fields was not supported. Later on, Haskin and Lorie [8] presented an advanced mechanism to handle long fields of a maximum size of about 2 Mbytes.

The Wisconsin storage system (WiSS) [7] provided partial access to large objects which could grow up to 1.6 Mbytes. Hence there is still a size limit predefined by the storage manager rather than just depending on the underlying hardware and operating system configuration. A common drawback of either system is the loss of sequentiality at the physical level, thereby giving rise to high access costs for objects stored on several pages.

EXODUS [6] not only overcame these limitations, but also introduced efficient handling of `large' objects which actually may be small, being represented by only a fractional amount of a disk page. The same holds, e.g. for the storage manager component of SHORE [5], the successor of EXODUS, and large objects (LOBs) in DB2 [10].

Several other approches to management of large objects have been explored, each of them emphasizing different properties. For instance, Starburst provides efficient read and append operations [11], EOS enables efficient partial insertion and deletion even in the middle of an object [2], [3], and BeSS [4] as well as Fellini [12] support the special requirements rising from multimedia applications.

In this paper we address the usage of large objects as they are offered by those systems. We do not propose any direct improvement of large object implementations, but rather introduce a new software layer on top of large object abstractions. Often large objects are not stand-alone entities, but components of a comprising structure that aggregates the large objects with other objects. Our approach enhances the efficiency of reading such aggregates without affecting any property of the large object implementation of the underlying storage manager. Thus we can still use its features concerning transaction management, concurrency control, logging and recovery, and handling of large objects.

The interface to a large object typically provides a minimum set of methods, including creation, deletion, resizing, reading, and updating large objects. The possibility of resizing an object gives rise to not only use the large object abstraction for objects that are really large, but rather for all objects that are variable in size. Moreover, some data type representations are potentially large, but not necessarily: An instance of a polygon data type, essentially represented by a set of vertex coordinates, may be a triangle as well as a region defined by thousands of vertices. The data type implementor should be allowed to use the same large object abstraction for all possible instantiations of polygons without loss of efficiency in case of actually small objects.

In case an aggregate contains embedded variable-sized objects, an important decision is how to store the aggregates and their contents:
Alternative 1. Use a single object to store the aggregation entirely. If the size of the complete aggregate is small, this is the right choice, since the object can be read via a single disk access. But if the size of any embedded large object is larger than some threshold size, e.g. a single disk page, the large object should be swapped out, only leaving a small access handle to the large object within the aggregate, since not all components of an aggregate are necessarily read when it is loaded into main memory.
Alternative 2. Store each large object within a storage space dedicated solely to the large object, leaving a reference handle within the aggregate that logically contains the large object. In case of really large objects this has the advantage of not always transferring lots of bytes from disk to main memory, even if the large object is not going to be read at all. In case of small objects, however, this strategy might force multiple disk access operations to read a single aggregate completely.

Consider a relational GIS containing a relation cities (name: STRING [20], population: INTEGER, area: INTEGER, shape: POLYGON). Many queries involving cities will not examine the shape attribute. This should be taken into account for the physical database design: if the size of a shape is large (because information about the shape of the actual city is detailed), it will be more efficient to vertically partition the relation in such a way that the shape value is stored externally, only loading it to main memory on demand. On the other hand, if shape is just a rectangle it does not hurt to store the shape value within the byte string representing the tuple, thereby avoiding additional disk access whenever the shape of the cities is to be read.

Such situations arise not only in relational systems, but whenever single data types may be logically combined to an aggregate. Hence the mechanisms presented in this paper are not restricted to implementations of relational systems, although, for convenience, in the sequel we will use the terms `tuple' and `attribute' in place of `aggregation' and `component', respectively.

In this paper we present a mechanism to handle large objects via a simple and clean interface while automatically switching from an in-aggregation representation to a stand-alone representation and vice versa, based on the size of a large object, whenever its comprising aggregation is to be stored on disk.

The rest of this document is structured as follows. Section 2 presents the basic concepts of our approach. Section 3 deals with the analysis of tuple access costs, based upon a linear cost model, in order to find a well performing threshold size indicating whether a large object should be swapped out or stored as part of the tuple string. In Section 4 we describe a C++ implementation of our approach, using the SHORE storage manager. Its performance is presented in Section 5. Section 6 compares our approach to other related work, and Section 7 concludes the paper.

Section snippets

Basic concepts

As demonstrated by the example in Section 1, an application implementor might wish to use the large object abstraction of the underlying storage manager, while actual instances of large objects may indeed be small. In case of large objects being a component of a tuple, i.e. an attribute value or part of an attribute value, the large object implementation should not be used for writing the object to disk, but rather be replaced by a more efficient implementation which exploits the fact that the

Threshold size analysis

When saving a tuple, for each FLOB we have to decide whether it should be saved as part of the tuple byte string or by a LOB on its own. For this purpose we use a threshold size parameter. If the FLOB to be saved is larger than the threshold size, it is swapped out to a separate LOB; otherwise it is appended to the tuple string. In this section, we analyse the impact of the threshold size on tuple access costs in order to identify good threshold sizes.

Implementation

Within this section, we at first present general interfaces for the abstractions used in our approach. Thereafter, we describe the C++ implementation of these interfaces on top of the SHORE storage manager. The contribution of this section is twofold: We demonstrate that the FLOB concept is implementable by mapping the abstractions introduced in Section 2 to C++ classes in a straightforward manner. On the other hand, we show by example the variations and extensions to the abstract concept

Results

Within this section, we report on the results of some experiments we performed using the implementation of the FLOB concept presented in Section 4. Hardware platform is a SUN SPARCstation 20 with 64 Mbytes of main memory and the Solaris 2.5.1 operating system running. We used the optimized version of SHORE 1.1.1 and the gcc 2.7.2.2 compiler with its 2.7.2 version of libraries. The SHORE buffer pool size is set to 1 Mbytes.

Related work

Lots of research efforts have been made to develop efficient implementations of persistent large objects. As a result, a variety of powerful large object representations is at the disposal of the implementor of a non-standard database system, all of them emphasizing different properties like efficient insertion in the middle of an object, guaranteed data troughput for sequential access, etc.

Within all those large object abstractions there is not paid any attention to the storage environment of

Conclusions

The contribution of this paper is the definition of a sound and general interface providing access to a mechanism which automatically switches the representation of large objects embedded within tuples, thereby increasing tuple access performance. We deduced the optimum threshold size for single FLOBs analytically, and demonstrated the benefit of FLOB usage for a wide range of application scenarios even with one threshold size for all FLOBs, so we demonstrated empirically the benefit of using

Stefan Dieker received his Diploma Degree (M.Sc.) in Computer Science from the University of Dortmund, Germany, in April 1996. After an intermediate employment as a software engineer, he became a research assistant in the group of R.H. Güting at the University of Hagen. His main research interests are architectures of modular extensible database systems, interfaces for database extension modules, and query optimization in modular extensible database systems.

References (16)

  • M.M. Astrahan, M.W. Blasgen, D.D. Chamberlin, K.P. Ewaran, J. Gray, P.P. Griffiths, W.F. King, R.A. Lorie, P.R....
  • A. Biliris, An efficient database storage structure for large dynamic objects, in: Proceedings of the Eighth...
  • A. Biliris, The performance of three database storage structures for managing large objects, in: M. Stonebraker (Ed.),...
  • A. Biliris, E. Panagos, The BeSS object storage manager: Architecture overview, SIGMOD Record, ACM Special Interest...
  • M.J. Carey, D.J. DeWitt, M.J. Franklin, N.E. Hall, M.L. McAuliffe, J.F. Naughton, D.T. Schuh, M.H. Solomon, C.K. Tan,...
  • M.J. Carey, D.J. DeWitt, J.E. Richardson, E.J. Shekita, Object and file management in the Exodus extensible database...
  • H.-T. Chou, D.J. DeWitt, R.H. Katz, A.C. Klug, Design and implementation of the Wisconsin Storage System, Software:...
  • R.L. Haskin, R.A. Lorie, On extending the functions of a relational database system, in: M. Schkolnick (Ed.),...
There are more references available in the full text version of this article.

Cited by (5)

Stefan Dieker received his Diploma Degree (M.Sc.) in Computer Science from the University of Dortmund, Germany, in April 1996. After an intermediate employment as a software engineer, he became a research assistant in the group of R.H. Güting at the University of Hagen. His main research interests are architectures of modular extensible database systems, interfaces for database extension modules, and query optimization in modular extensible database systems.

Ralf Hartmut Güting is a professor in computer science at the University of Hagen, Germany, since 1989. He received his Diploma and Dr. rer. nat. degrees from the University of Dortmund in 1980 and 1983, respectively, and became a professor at that university in 1987. From 1981 to 1984 his main research area was computational geometry. After a one-year stay at the IBM Almaden Research Center in 1985, extensible and spatial database systems became his major research interests. His group has built a prototype of an extensible spatial DBMS, the Gral System. He is an editor of the VLDB Journal and of GeoInformatica and the program chairperson of the International Symposium on Spatial Databases (SSD) in Hong Kong, China, 1999.

View full text