Abstract
Declustering is used to distribute blocks of data among multiple devices, thus enabling parallel I/O access and reducing query response times. Many data declustering schemes have been proposed in the literature. However, these schemes are designed for non-replication systems, and thus they will fail if any disk fails. Assume that a single disk would fail once every five years, a non-replication system with 100 disks would have failed every 18 days. Data replication is a technique commonly used in multidisk systems to enhance availability of data during disk failures and, often as a second goal, to improve I/O performance of read-intensive applications. In this paper, we propose a LOG data declustering scheme for systems with replication. Furthermore, we present a novel replication algorithm. Although the replication algorithm is designed for the LOG declustering scheme, it is also applicable to existing schemes such as DM, GFIB, and GRS. Finally, as demonstrated by our experimental results, the LOG scheme with the proposed replication algorithm provides a significant performance improvement compared to the state-of-the-art data declustering schemes.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Gupta, A., Sarawagi, S.: Modeling multidimensional databases. In: Proc. of the 13th International Conference on Data Engineering, pp. 232–243 (1997)
Bhatia, R., Sinha, R., Chen, C.M.: Declustering using golden ratio sequences. In: Proc. of International Conference on Data Engineering, ICDE (2000)
Chaudhuri, S., Dayal, U.: An overview of data warehousing and olap technology. SIGMOD Record 26(1), 65–74 (1997)
Chen, C.M., Cheng, C.T.: From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries. In: ACM PODS (2002)
Chen, C.Y., Lin, H.F., Chang, C.C., Lee, R.C.T.: Optimal bucket allocation design of k-ary mkh files for partial match retrieval. IEEE Transactions on Knowledge and Data Engineering 9(1), 148–160 (1997)
Du, D.H., Sobolewski, J.S.: Disk allocation for cartesian product files on multiple-disk systems. ACM Trans. Database Systems 7(1), 82–101 (1982)
Du, D.H., Sobolewski, J.S.: Disk allocation methods for binary cartesian product files. Journal BIT 26, 138–147 (1986)
Faloutsos, C., Metaxas, D.: Disk allocation methods using error correcting codes. IEEE Transactions on Computers 40(8), 907–914 (1991)
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: Proc. of 12th International Conference on Data Engineering, pp. 152–159 (1996)
Kim, M.H., Pramanik, S.: Optimal file distribution for partial match retrieval. In: ACM International Conference on Management of Data, SIGMOD (1998)
Lee, T.W., Ling, S.Y., Li, H.G.: Hierarchical compact cube for rangemax queries. In: Proceedings of the 26th International Conference on VLDB, pp. 232–241 (2000)
Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., Abbadi, A.E.: Cyclic allocation of two-dimensional data. In: 14th International Conference on Data Engineering, ICDE (1998)
Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., Abbadi, A.E.: Efficient retrieval of multidimensional datasets through parallel i/o. In: International Conference on High Performance Computing (1998)
Sinha, R., Bhatia, R., Chen, C.M.: Asymptotically optimal declustering schemes for range queries. In: International Conference on Database Theory, ICDT (2001)
Sung, Y.Y.: Performance analysis of disk modulo allocation method for cartesian product files. IEEE Transactions on Software Engineering SE-13(9), 1018–1026 (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, Y., Sung, S.Y., Xiong, H., Ng, P. (2004). Data Declustering with Replications. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_61
Download citation
DOI: https://doi.org/10.1007/978-3-540-24571-1_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21047-4
Online ISBN: 978-3-540-24571-1
eBook Packages: Springer Book Archive