Skip to main content
Log in

RSEDP: an effective hybrid data placement algorithm for large-scale storage systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The reliability and scalability of large-scale network storage systems are confronted with big challenges, which require designing a reliable, scalable, and efficient data placement algorithm. Previous techniques can only partially satisfy these requirements. In this work, we develop an effective hybrid approach, RSEDP, which combines reliable replication data placement (RRDP) with scalable and efficient data placement (SEDP) to achieve the requirements mentioned above. RRDP distributes replicated data over large-scale heterogeneous network storage systems in which the same replica is distributed to different devices and not inclined to consecutive devices, achieving high redundancy degree and failure resilience. SEDP assigns data evenly among devices according to their weight and scales well to the expansions or curtailments of the systems. In order to take the advantages of both RRDP and SEDP, RSEDP integrates them by categorizing data into hot and cold data based on their access frequency, placing hot data by RRDP, and distributing the remainder by SEDP. The theoretical analysis and the experimental study show that the combined RSEDP can increase redundancy degree and failure resilience, and has a good scalability and time efficiency with small memory overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Braam RJ (2004) The Lustre storage architecture. http://www.lustre.org/documentation.html. Cluster File Systems Inc, Aug

  2. Brinkmann A, Salzwedel K, Scheideler C (2000) Efficient, distributed data placement strategies for storage area networks. In: Proc of SPAA, 2000

  3. Brinkmann A, Salzwedel K, Scheideler C (2002) Compact, adaptive placement schemes for non-uniform distribution requirements. In: Proc of SPAA, 2002

  4. Brinkmann A, Effert S, Meyer auf der Heide F, Scheideler C (2007) Dynamic and redundant data placement. In: Proc of ICDCS, 2007

  5. Dabek F, Kaashoek MF, Karger D et al (2001) Wide-area cooperative storage with CFS. In: Proc of the 18th ACM symposium on operating system principles, 2001

  6. Frieze A, Jerrum M (1997) Improved Approximation Algorithms for MAX k-CUT and MAX BISECTION. Algorithmica, 1997

  7. Ghemawat S, Gobioff H, Leung S-T (2003) The Google File System. In: Proc of the 19th ACM symposium on operating system principles, 2003

  8. Gobioff H, Gibson G, Tygar D (1997) Security for network-attached storage devices. Tech Rep TRCMU-CS-97-185

  9. Honicky RJ, Miller EL (2003) A fast algorithm for online placement and reorganization of replicated data. In: Proc of IPDPS, 2003

  10. Honicky RJ, Miller EL (2004) Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution. In: Proc of IPDPS, 2004

  11. Karger D, Lehman E, Leighton T, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: Proc of STOC, 1997

  12. Nagle D, Serenyi D, Matthews A (2004) The Panasas ActiveScale storage cluster—delivering scalable high bandwidth storage. In: Proc of SC, 2004

  13. Rowstron A, Druschel P (2001) Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In: Proc of the 18th ACM symp on operating systems principles, 2001

  14. Schindelhauer C, Schomaker G (2005) Weighted distributed hash tables. In: Proc of SPAA, 2005

  15. Schmuck F, Haskin R (2002) GPFS: a shared-disk file system for large computing clusters. In: Proc of FAST, 2002

  16. van Renesse R (2004) Efficient reliable Internet storage. In: Workshop on dependable distributed data management, 2004

  17. Weil SA, Brandt SA, Miller EL, Maltzahn C (2006) CRUSH: controlled, scalable and decentralized placement of replicated data. In: Proc of SC, 2006

  18. Weil S, Leung A, Brandt SA, Maltzahn C (2007) RADOS: a fast, scalable, and reliable storage service for petabyte-scale storage clusters. In: Proc of the ACM petascale data storage workshop, 2007

  19. Welch B, Unangst M, Abbasi Z, Gibson G (2008) Scalable performance of the Panasas parallel file system. In: Proc of FAST, USENIX, 2008

  20. Wu C, R Burns (2003) Handling heterogeneity in shared-disk file systems. In: Proc of SC, 2003

  21. Zhong L (2007) Efficient, balanced data placement algorithm in scalable storage clusters. J Commun Comput USA 4(7)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nong Xiao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, N., Chen, T. & Liu, F. RSEDP: an effective hybrid data placement algorithm for large-scale storage systems. J Supercomput 55, 103–122 (2011). https://doi.org/10.1007/s11227-009-0357-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-009-0357-7

Navigation