Semantic partitioning as a basis for parallel I/O in database management systems
Introduction
Access to secondary storage units is still a bottleneck for data processing in information systems. In particular, relational database systems encounter this performance problem dealing with expensive operations such as the join operation. Although a permanent main-memory storage system improves performance [10], we attempt to find a generic solution for large databases that do not entirely fit in a main memory and are managed by a DBMS. Therefore, we focus on parallel disk access as a solution for the mentioned performance problem.
An array of independent disk drives in combination with some kind of redundancy – either replication (mirroring, RAID level 1) or file striping in combination with additional data for checking – can offer better performance, availability and reliability as well as lower costs than mainframe storage units [24]. The performance of RAID systems depends on the size of the data addressed by the dominating transactions and the size of file striping units [13], [17], [24], [41]. According to Scheuermann et al. [26], very small striping units lead to a balanced distribution of the workload and a good response time when the workload is low. However, when a high throughput during high workloads is desired, larger striping units must be applied. The optimal striping width (number of disks per file) also depends on the workload [40]. In general, these solutions can support fast access to complete records or record files as desired in modern (multi-media) applications such as ‘video on demand’.
However, when applications are only interested in parts of records, record-oriented solutions, even if they apply parallel I/O, inevitably lead to the reading of irrelevant data. In order to avoid this superfluous work [15], we can store attribute values in separate (transposed) attribute files. We consider three reasonable solutions for the storage of transposed files:
1. Transposed files can be stored on a single disk drive [6], [7], [29]. This solution is currently applied in the centralized Xplain-DBMS [34], [37], [38], [39]; it circumvents the problem of finding an optimal clustering of attributes [16], [19], [42] and does not need to be revised when the transaction pattern changes. This solution can also be applied when the transaction pattern is unknown. However, a disadvantage is the cost of inserting a single object (instance, record or tuple): the time spent during successive disk accesses is proportional to the number of attributes [18]. The same applies to retrieving more than one attribute from a composite type and also to conditional retrievals requiring the evaluation of two or more attributes.
2. A transposed file can be stored as fragments distributed in an array of disk drives using file striping. An advantage is that existing technology can be applied. However, this solution is byte oriented: it ignores data semantics and cannot be applied by a DBMS because in the classical layered database architecture a DBMS functions on the basis of system tables [16]. Some tables deal with the mapping between data model and file system. Using these tables, queries are transformed into file operations. However, in the layered database architecture further details about files are unknown to a DBMS. Consequently, even though byte-oriented striping of transposed files might function very well, it cannot be optimized by a DBMS.
3. A transposed file can be stored on one of the drives of a disk array. As in solution 1, the problem of finding an optimal clustering of attributes is avoided. The unit of storage complies well with the data model and only the allocation of transposed files needs to be considered when query optimization by a DBMS is desired. Although the third solution might not be optimal, it can be managed by a DBMS on the basis of data semantics.
If data about queries and their frequency is registered, then some degree of optimization also becomes possible. Then a DBMS can detect an unbalanced workload of disk drives and might reallocate some of the transposed files. We prefer the third solution because it performs better than the first solution and unlike the second solution it can be managed by a DBMS.
In order to improve both system reliability and data availability we also propose to duplicate each transposed file on a separate drive, using one disk controller and one cache memory per drive. A drive should not contain more than one of the transposed files associated with a composite type. This enables a DBMS to split small write operations (i.e., inserting a composite object) into parallel disk accesses, leading to a faster insertion rate than the higher RAID levels (level > 1) can offer [13], [24]. These levels require additional operations for reading, constructing and rewriting parity data for each insertion. Both storage of calculated parity data and data duplication is a precaution against the risk of disk failures, which increases with the number of disk drives [24].
A remaining problem is the consistency of the proposed allocation of transposed files. In the case of replicating record files (mirroring, RAID level 1) this seems not to be a problem. Both byte-oriented file striping and file mirroring can be hidden for programmers [21], [22], [44], because these techniques ignore data semantics.
In previous work [1], [2], [4], based on the concepts of the Xplain DBMS [25], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], we described a framework for the correctness of simple kinds of horizontal fragmentation in geographically distributed environments.
Supplementary to that, the present paper describes a generic solution for the correctness of the proposed local distribution of transposed files and their duplicates on independent disk drives in a system possibly being part of a distributed system. This solution will be specified in abstract terms (concepts), so it is independent of the chosen programming language and the underlying data model. Before introducing these concepts in Section 3, we describe in Section 2 our conceptual foundation: the semantic concepts of Xplain. A pragmatic reason to apply these concepts is improving the performance of the Xplain DBMS. Another reason is the analytical strength of semantic abstraction hierarchies [5], [38]. Section 2 discusses the following subjects related to the Xplain approach:
- •
Concepts for data modeling (Section 2.1).
- •
Data manipulation concepts (Section 2.2).
- •
Static restrictions (assertions) (Section 2.3).
- •
The evaluation of assertions (Section 2.4).
- •
Implementation aspects (Section 2.5).
Section snippets
Concepts for data modeling
Data modeling in Xplain is based on the concepts of ‘type’ and ‘attribute’. For example, the following simplified model of a large sale organization (Fig. 1) contains a composite type ‘employee’ having five attributes. We do not show the value domains of types because they are not relevant to the problems discussed here. Inherent to a data model, each instance of a composite type has a single identification [20]. According to the principle of convertibility [30] this instance is also
A framework for a correct local distribution of transposed files
A DBMS has to manage the allocation of semantically related transposed files and their duplicates to different drives on the basis of a correct data distribution scheme. Criteria for the correctness of data distribution schemes can be derived from the textbooks of Ceri and Pelagatti [11] and Özsu and Valduriez [23]: disjointness fragments may not overlap completeness each data element belongs to exactly one fragment minimal allocation each fragment has at least one storage location reconstruction
Additional consistency rules
Applying the meta model of Fig. 4, a DBMS can execute the following activities for the specification of the proposed local data allocation in the following order:
- •
The registration of a correct number of drives and directories (at least one directory per drive).
- •
The registration of a correct number of reserved drives and files per composite type.
- •
The registration of one reserved file for each reserved drive and for each file.
- •
The registration either of one type file, one index file or one attribute
Discussion
The proposed local distribution of transposed files and their copies within an array of independent disk drives is based on the semantics of data and has the following advantages to storing transposed files on a single disk drive:
- •
Improved performance through I/O parallelism within and between transactions.
- •
Improved system reliability.
- •
Improved data availability.
- •
Faster and automatic registration of backups.
Acknowledgements
Thanks are due to the anonymous referees for their valuable comments, contributing significantly to improving the present paper.
References (44)
- et al.
Effect of schema size on fragmentation design in multirelational databases
Information Systems
(1990) - et al.
Overlay striping and optimal parallel I/O for modern applications
Parallel Computing
(1998) A unifying approach to the modeling of object association based on a common property
A semantic approach to enforce correctness of data distribution schemes
The Computer Journal
(1994)- J.A. Bakker, Object-orientation based on semantic transformations, in: R.R. Wagner, H. Thoma (Eds.), Proceedings...
- J.A. Bakker, An extended meta model for conditional fragmentation, in: G. Quirchmayr, E. Schweighofer, T.J.M....
- J.A. Bakker, Advantages of a hierarchical presentation of data structures, in: J. Filipe, J. Cordeiro (Eds.),...
On searching transposed files
ACM Transactions on Database Systems
(1979)- et al.
A unifying model of physical databases
ACM Transactions on Database Systems
(1982) - et al.
Organization and maintenance of large ordered indexes
Acta Informatica
(1972)
Prefix-B-trees
ACM Transactions on Database Systems
Distributed Databases: Principles and Systems
Exploiting inheritance and structure semantics for effective clustering and buffering in an object-oriented DBMS
SIGMOD Record
RAID: High-performance reliable secondary storage
ACM Computing Surveys
The Ubiquitous B-tree
ACM Computing Surveys
An effective approach to vertical partitioning for physical design of relational databases
IEEE Transactions on Software Engineering
An Introduction to Database Systems
Disk arrays: high-performance, high-reliability storage subsystems
IEEE Computer
A heuristic approach to attribute partitioning
SIGMOD Record
Object Identity
SIGPLAN Notices
Multiprogramming and concurrency in the parallel file environment
International Journal of Mini and Microcomputers
Cited by (6)
An optimal workload-based data allocation approach for multidisk databases
2009, Data and Knowledge EngineeringOptimization approach for data allocation in multidisk database
2002, European Journal of Operational ResearchSurvey on OLTP application oriented data distribution in cloud computing
2016, Jisuanji Xuebao/Chinese Journal of ComputersImproving data quality control in the Xplain-DBMS
2012, Data Science JournalA threshold based dynamic data allocation algorithm-A Markov chain model approach
2007, Journal of Applied SciencesA semantic framework for the design of data distribution schemes
2000, Proceedings - International Workshop on Database and Expert Systems Applications, DEXA