Skip to main content

Data Declustering with Replications

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2973))

Abstract

Declustering is used to distribute blocks of data among multiple devices, thus enabling parallel I/O access and reducing query response times. Many data declustering schemes have been proposed in the literature. However, these schemes are designed for non-replication systems, and thus they will fail if any disk fails. Assume that a single disk would fail once every five years, a non-replication system with 100 disks would have failed every 18 days. Data replication is a technique commonly used in multidisk systems to enhance availability of data during disk failures and, often as a second goal, to improve I/O performance of read-intensive applications. In this paper, we propose a LOG data declustering scheme for systems with replication. Furthermore, we present a novel replication algorithm. Although the replication algorithm is designed for the LOG declustering scheme, it is also applicable to existing schemes such as DM, GFIB, and GRS. Finally, as demonstrated by our experimental results, the LOG scheme with the proposed replication algorithm provides a significant performance improvement compared to the state-of-the-art data declustering schemes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Gupta, A., Sarawagi, S.: Modeling multidimensional databases. In: Proc. of the 13th International Conference on Data Engineering, pp. 232–243 (1997)

    Google Scholar 

  2. Bhatia, R., Sinha, R., Chen, C.M.: Declustering using golden ratio sequences. In: Proc. of International Conference on Data Engineering, ICDE (2000)

    Google Scholar 

  3. Chaudhuri, S., Dayal, U.: An overview of data warehousing and olap technology. SIGMOD Record 26(1), 65–74 (1997)

    Article  Google Scholar 

  4. Chen, C.M., Cheng, C.T.: From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries. In: ACM PODS (2002)

    Google Scholar 

  5. Chen, C.Y., Lin, H.F., Chang, C.C., Lee, R.C.T.: Optimal bucket allocation design of k-ary mkh files for partial match retrieval. IEEE Transactions on Knowledge and Data Engineering 9(1), 148–160 (1997)

    Article  Google Scholar 

  6. Du, D.H., Sobolewski, J.S.: Disk allocation for cartesian product files on multiple-disk systems. ACM Trans. Database Systems 7(1), 82–101 (1982)

    Article  MATH  Google Scholar 

  7. Du, D.H., Sobolewski, J.S.: Disk allocation methods for binary cartesian product files. Journal BIT 26, 138–147 (1986)

    Article  MATH  Google Scholar 

  8. Faloutsos, C., Metaxas, D.: Disk allocation methods using error correcting codes. IEEE Transactions on Computers 40(8), 907–914 (1991)

    Article  Google Scholar 

  9. Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: Proc. of 12th International Conference on Data Engineering, pp. 152–159 (1996)

    Google Scholar 

  10. Kim, M.H., Pramanik, S.: Optimal file distribution for partial match retrieval. In: ACM International Conference on Management of Data, SIGMOD (1998)

    Google Scholar 

  11. Lee, T.W., Ling, S.Y., Li, H.G.: Hierarchical compact cube for rangemax queries. In: Proceedings of the 26th International Conference on VLDB, pp. 232–241 (2000)

    Google Scholar 

  12. Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., Abbadi, A.E.: Cyclic allocation of two-dimensional data. In: 14th International Conference on Data Engineering, ICDE (1998)

    Google Scholar 

  13. Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., Abbadi, A.E.: Efficient retrieval of multidimensional datasets through parallel i/o. In: International Conference on High Performance Computing (1998)

    Google Scholar 

  14. Sinha, R., Bhatia, R., Chen, C.M.: Asymptotically optimal declustering schemes for range queries. In: International Conference on Database Theory, ICDT (2001)

    Google Scholar 

  15. Sung, Y.Y.: Performance analysis of disk modulo allocation method for cartesian product files. IEEE Transactions on Software Engineering SE-13(9), 1018–1026 (1987)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, Y., Sung, S.Y., Xiong, H., Ng, P. (2004). Data Declustering with Replications. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24571-1_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21047-4

  • Online ISBN: 978-3-540-24571-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics