Elsevier

Parallel Computing

Volume 26, Issue 11, October 2000, Pages 1491-1513
Parallel Computing

Semantic partitioning as a basis for parallel I/O in database management systems

https://doi.org/10.1016/S0167-8191(00)00041-7Get rights and content

Abstract

Modern applications such as ‘video on demand’ require fast reading of complete files, which can be supported well by file striping. Many conventional applications, however, are only interested in some part of the available records. In order to avoid reading attributes irrelevant to such applications, each attribute could be stored in a separate (transposed) file. Aiming at I/O parallelism, byte-oriented striping could be applied to transposed files. However, such a fragmentation ignores the semantics of data. This fragmentation cannot be optimized by a database management system (DBMS) because a DBMS has to perform its tasks on the basis of data semantics. For example, queries must be translated into file operations using a scheme that maps a data model to a file system.

However, details about files, such as the striping width, are invisible to a DBMS. Therefore, we propose to store each transposed file related to a composite type on a separate, independent disk drive, which means I/O parallelism tuned to a data model. As we also aim at system reliability and data availability, each transposed file must be duplicated on another drive. Consequently, a DBMS also has to guarantee correctness and completeness of the allocation of transposed files within an array of disk drives. As a solution independent of the underlying data model, we propose an abstract framework consisting of a meta model and a set of rules.

Introduction

Access to secondary storage units is still a bottleneck for data processing in information systems. In particular, relational database systems encounter this performance problem dealing with expensive operations such as the join operation. Although a permanent main-memory storage system improves performance [10], we attempt to find a generic solution for large databases that do not entirely fit in a main memory and are managed by a DBMS. Therefore, we focus on parallel disk access as a solution for the mentioned performance problem.

An array of independent disk drives in combination with some kind of redundancy – either replication (mirroring, RAID level 1) or file striping in combination with additional data for checking – can offer better performance, availability and reliability as well as lower costs than mainframe storage units [24]. The performance of RAID systems depends on the size of the data addressed by the dominating transactions and the size of file striping units [13], [17], [24], [41]. According to Scheuermann et al. [26], very small striping units lead to a balanced distribution of the workload and a good response time when the workload is low. However, when a high throughput during high workloads is desired, larger striping units must be applied. The optimal striping width (number of disks per file) also depends on the workload [40]. In general, these solutions can support fast access to complete records or record files as desired in modern (multi-media) applications such as ‘video on demand’.

However, when applications are only interested in parts of records, record-oriented solutions, even if they apply parallel I/O, inevitably lead to the reading of irrelevant data. In order to avoid this superfluous work [15], we can store attribute values in separate (transposed) attribute files. We consider three reasonable solutions for the storage of transposed files:

1. Transposed files can be stored on a single disk drive [6], [7], [29]. This solution is currently applied in the centralized Xplain-DBMS [34], [37], [38], [39]; it circumvents the problem of finding an optimal clustering of attributes [16], [19], [42] and does not need to be revised when the transaction pattern changes. This solution can also be applied when the transaction pattern is unknown. However, a disadvantage is the cost of inserting a single object (instance, record or tuple): the time spent during successive disk accesses is proportional to the number of attributes [18]. The same applies to retrieving more than one attribute from a composite type and also to conditional retrievals requiring the evaluation of two or more attributes.

2. A transposed file can be stored as fragments distributed in an array of disk drives using file striping. An advantage is that existing technology can be applied. However, this solution is byte oriented: it ignores data semantics and cannot be applied by a DBMS because in the classical layered database architecture a DBMS functions on the basis of system tables [16]. Some tables deal with the mapping between data model and file system. Using these tables, queries are transformed into file operations. However, in the layered database architecture further details about files are unknown to a DBMS. Consequently, even though byte-oriented striping of transposed files might function very well, it cannot be optimized by a DBMS.

3. A transposed file can be stored on one of the drives of a disk array. As in solution 1, the problem of finding an optimal clustering of attributes is avoided. The unit of storage complies well with the data model and only the allocation of transposed files needs to be considered when query optimization by a DBMS is desired. Although the third solution might not be optimal, it can be managed by a DBMS on the basis of data semantics.

If data about queries and their frequency is registered, then some degree of optimization also becomes possible. Then a DBMS can detect an unbalanced workload of disk drives and might reallocate some of the transposed files. We prefer the third solution because it performs better than the first solution and unlike the second solution it can be managed by a DBMS.

In order to improve both system reliability and data availability we also propose to duplicate each transposed file on a separate drive, using one disk controller and one cache memory per drive. A drive should not contain more than one of the transposed files associated with a composite type. This enables a DBMS to split small write operations (i.e., inserting a composite object) into parallel disk accesses, leading to a faster insertion rate than the higher RAID levels (level > 1) can offer [13], [24]. These levels require additional operations for reading, constructing and rewriting parity data for each insertion. Both storage of calculated parity data and data duplication is a precaution against the risk of disk failures, which increases with the number of disk drives [24].

A remaining problem is the consistency of the proposed allocation of transposed files. In the case of replicating record files (mirroring, RAID level 1) this seems not to be a problem. Both byte-oriented file striping and file mirroring can be hidden for programmers [21], [22], [44], because these techniques ignore data semantics.

In previous work [1], [2], [4], based on the concepts of the Xplain DBMS [25], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], we described a framework for the correctness of simple kinds of horizontal fragmentation in geographically distributed environments.

Supplementary to that, the present paper describes a generic solution for the correctness of the proposed local distribution of transposed files and their duplicates on independent disk drives in a system possibly being part of a distributed system. This solution will be specified in abstract terms (concepts), so it is independent of the chosen programming language and the underlying data model. Before introducing these concepts in Section 3, we describe in Section 2 our conceptual foundation: the semantic concepts of Xplain. A pragmatic reason to apply these concepts is improving the performance of the Xplain DBMS. Another reason is the analytical strength of semantic abstraction hierarchies [5], [38]. Section 2 discusses the following subjects related to the Xplain approach:

  • Concepts for data modeling (Section 2.1).

  • Data manipulation concepts (Section 2.2).

  • Static restrictions (assertions) (Section 2.3).

  • The evaluation of assertions (Section 2.4).

  • Implementation aspects (Section 2.5).

Section snippets

Concepts for data modeling

Data modeling in Xplain is based on the concepts of ‘type’ and ‘attribute’. For example, the following simplified model of a large sale organization (Fig. 1) contains a composite type ‘employee’ having five attributes. We do not show the value domains of types because they are not relevant to the problems discussed here. Inherent to a data model, each instance of a composite type has a single identification [20]. According to the principle of convertibility [30] this instance is also

A framework for a correct local distribution of transposed files

A DBMS has to manage the allocation of semantically related transposed files and their duplicates to different drives on the basis of a correct data distribution scheme. Criteria for the correctness of data distribution schemes can be derived from the textbooks of Ceri and Pelagatti [11] and Özsu and Valduriez [23]:

 disjointnessfragments may not overlap
 completenesseach data element belongs to exactly one fragment
 minimal allocationeach fragment has at least one storage location
 reconstruction

Additional consistency rules

Applying the meta model of Fig. 4, a DBMS can execute the following activities for the specification of the proposed local data allocation in the following order:

  • The registration of a correct number of drives and directories (at least one directory per drive).

  • The registration of a correct number of reserved drives and files per composite type.

  • The registration of one reserved file for each reserved drive and for each file.

  • The registration either of one type file, one index file or one attribute

Discussion

The proposed local distribution of transposed files and their copies within an array of independent disk drives is based on the semantics of data and has the following advantages to storing transposed files on a single disk drive:

  • Improved performance through I/O parallelism within and between transactions.

  • Improved system reliability.

  • Improved data availability.

  • Faster and automatic registration of backups.

These advantages cannot be obtained without increased system complexity, which introduces

Acknowledgements

Thanks are due to the anonymous referees for their valuable comments, contributing significantly to improving the present paper.

References (44)

  • N. Gorla et al.

    Effect of schema size on fragmentation design in multirelational databases

    Information Systems

    (1990)
  • P. Triantafillou et al.

    Overlay striping and optimal parallel I/O for modern applications

    Parallel Computing

    (1998)
  • J.A. Bakker

    A unifying approach to the modeling of object association based on a common property

  • J.A. Bakker

    A semantic approach to enforce correctness of data distribution schemes

    The Computer Journal

    (1994)
  • J.A. Bakker, Object-orientation based on semantic transformations, in: R.R. Wagner, H. Thoma (Eds.), Proceedings...
  • J.A. Bakker, An extended meta model for conditional fragmentation, in: G. Quirchmayr, E. Schweighofer, T.J.M....
  • J.A. Bakker, Advantages of a hierarchical presentation of data structures, in: J. Filipe, J. Cordeiro (Eds.),...
  • D.S. Batory

    On searching transposed files

    ACM Transactions on Database Systems

    (1979)
  • D.S. Batory et al.

    A unifying model of physical databases

    ACM Transactions on Database Systems

    (1982)
  • R. Bayer et al.

    Organization and maintenance of large ordered indexes

    Acta Informatica

    (1972)
  • R. Bayer et al.

    Prefix-B-trees

    ACM Transactions on Database Systems

    (1977)
  • P.A. Boncz, S. Manegold, M.L. Kersten, Database Architecture Optimized for the new Bottleneck: Memory Access, in: M.P....
  • S. Ceri et al.

    Distributed Databases: Principles and Systems

    (1984)
  • E.E. Chang et al.

    Exploiting inheritance and structure semantics for effective clustering and buffering in an object-oriented DBMS

    SIGMOD Record

    (1989)
  • P.M. Chen et al.

    RAID: High-performance reliable secondary storage

    ACM Computing Surveys

    (1994)
  • D. Comer

    The Ubiquitous B-tree

    ACM Computing Surveys

    (1979)
  • D.W. Cornell et al.

    An effective approach to vertical partitioning for physical design of relational databases

    IEEE Transactions on Software Engineering

    (1990)
  • C.J. Date

    An Introduction to Database Systems

    (1995)
  • G.R. Ganger et al.

    Disk arrays: high-performance, high-reliability storage subsystems

    IEEE Computer

    (1994)
  • M. Hammer et al.

    A heuristic approach to attribute partitioning

    SIGMOD Record

    (1979)
  • S.N. Khoshafian et al.

    Object Identity

    SIGPLAN Notices

    (1986)
  • L.L. Miller et al.

    Multiprogramming and concurrency in the parallel file environment

    International Journal of Mini and Microcomputers

    (1991)
  • Cited by (6)

    • Optimization approach for data allocation in multidisk database

      2002, European Journal of Operational Research
    • Survey on OLTP application oriented data distribution in cloud computing

      2016, Jisuanji Xuebao/Chinese Journal of Computers
    • A semantic framework for the design of data distribution schemes

      2000, Proceedings - International Workshop on Database and Expert Systems Applications, DEXA
    View full text