Data Declustering with Replications

Liu, Yao; Sung, Sam Y.; Xiong, Hui; Ng, Peter

doi:10.1007/978-3-540-24571-1_61

Data Declustering with Replications

Yao Liu⁸,
Sam Y. Sung⁸,
Hui Xiong⁹ &
…
Peter Ng¹⁰

Conference paper

1007 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2973))

Abstract

Declustering is used to distribute blocks of data among multiple devices, thus enabling parallel I/O access and reducing query response times. Many data declustering schemes have been proposed in the literature. However, these schemes are designed for non-replication systems, and thus they will fail if any disk fails. Assume that a single disk would fail once every five years, a non-replication system with 100 disks would have failed every 18 days. Data replication is a technique commonly used in multidisk systems to enhance availability of data during disk failures and, often as a second goal, to improve I/O performance of read-intensive applications. In this paper, we propose a LOG data declustering scheme for systems with replication. Furthermore, we present a novel replication algorithm. Although the replication algorithm is designed for the LOG declustering scheme, it is also applicable to existing schemes such as DM, GFIB, and GRS. Finally, as demonstrated by our experimental results, the LOG scheme with the proposed replication algorithm provides a significant performance improvement compared to the state-of-the-art data declustering schemes.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Gupta, A., Sarawagi, S.: Modeling multidimensional databases. In: Proc. of the 13th International Conference on Data Engineering, pp. 232–243 (1997)
Google Scholar
Bhatia, R., Sinha, R., Chen, C.M.: Declustering using golden ratio sequences. In: Proc. of International Conference on Data Engineering, ICDE (2000)
Google Scholar
Chaudhuri, S., Dayal, U.: An overview of data warehousing and olap technology. SIGMOD Record 26(1), 65–74 (1997)
Article Google Scholar
Chen, C.M., Cheng, C.T.: From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries. In: ACM PODS (2002)
Google Scholar
Chen, C.Y., Lin, H.F., Chang, C.C., Lee, R.C.T.: Optimal bucket allocation design of k-ary mkh files for partial match retrieval. IEEE Transactions on Knowledge and Data Engineering 9(1), 148–160 (1997)
Article Google Scholar
Du, D.H., Sobolewski, J.S.: Disk allocation for cartesian product files on multiple-disk systems. ACM Trans. Database Systems 7(1), 82–101 (1982)
Article MATH Google Scholar
Du, D.H., Sobolewski, J.S.: Disk allocation methods for binary cartesian product files. Journal BIT 26, 138–147 (1986)
Article MATH Google Scholar
Faloutsos, C., Metaxas, D.: Disk allocation methods using error correcting codes. IEEE Transactions on Computers 40(8), 907–914 (1991)
Article Google Scholar
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: Proc. of 12th International Conference on Data Engineering, pp. 152–159 (1996)
Google Scholar
Kim, M.H., Pramanik, S.: Optimal file distribution for partial match retrieval. In: ACM International Conference on Management of Data, SIGMOD (1998)
Google Scholar
Lee, T.W., Ling, S.Y., Li, H.G.: Hierarchical compact cube for rangemax queries. In: Proceedings of the 26th International Conference on VLDB, pp. 232–241 (2000)
Google Scholar
Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., Abbadi, A.E.: Cyclic allocation of two-dimensional data. In: 14th International Conference on Data Engineering, ICDE (1998)
Google Scholar
Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., Abbadi, A.E.: Efficient retrieval of multidimensional datasets through parallel i/o. In: International Conference on High Performance Computing (1998)
Google Scholar
Sinha, R., Bhatia, R., Chen, C.M.: Asymptotically optimal declustering schemes for range queries. In: International Conference on Database Theory, ICDT (2001)
Google Scholar
Sung, Y.Y.: Performance analysis of disk modulo allocation method for cartesian product files. IEEE Transactions on Software Engineering SE-13(9), 1018–1026 (1987)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National University of Singapore,
Yao Liu & Sam Y. Sung
Department of Computer Science, University of Minnesota – Twin Cities,
Hui Xiong
Department of Computer Science, University of Texas – Pan American,
Peter Ng

Authors

Yao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Sam Y. Sung
View author publications
You can also search for this author in PubMed Google Scholar
Hui Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Peter Ng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, KAIST, 373-1 Guseong-dong Yuseong-gu, 305-701, Daejeon, Korea
YoonJoon Lee
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
Computer Science Department and, Advanced Information Technology Research Center(AITrc), KAIST, Korea
Kyu-Young Whang
Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, 305-701, Daejeon, Republic of Korea
Doheon Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Sung, S.Y., Xiong, H., Ng, P. (2004). Data Declustering with Replications. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_61

Download citation

DOI: https://doi.org/10.1007/978-3-540-24571-1_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21047-4
Online ISBN: 978-3-540-24571-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics