RSEDP: an effective hybrid data placement algorithm for large-scale storage systems

Xiao, Nong; Chen, Tao; Liu, Fang

doi:10.1007/s11227-009-0357-7

RSEDP: an effective hybrid data placement algorithm for large-scale storage systems

Published: 19 November 2009

Volume 55, pages 103–122, (2011)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Nong Xiao¹,
Tao Chen¹ &
Fang Liu¹

178 Accesses
2 Citations
Explore all metrics

Abstract

The reliability and scalability of large-scale network storage systems are confronted with big challenges, which require designing a reliable, scalable, and efficient data placement algorithm. Previous techniques can only partially satisfy these requirements. In this work, we develop an effective hybrid approach, RSEDP, which combines reliable replication data placement (RRDP) with scalable and efficient data placement (SEDP) to achieve the requirements mentioned above. RRDP distributes replicated data over large-scale heterogeneous network storage systems in which the same replica is distributed to different devices and not inclined to consecutive devices, achieving high redundancy degree and failure resilience. SEDP assigns data evenly among devices according to their weight and scales well to the expansions or curtailments of the systems. In order to take the advantages of both RRDP and SEDP, RSEDP integrates them by categorizing data into hot and cold data based on their access frequency, placing hot data by RRDP, and distributing the remainder by SEDP. The theoretical analysis and the experimental study show that the combined RSEDP can increase redundancy degree and failure resilience, and has a good scalability and time efficiency with small memory overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Braam RJ (2004) The Lustre storage architecture. http://www.lustre.org/documentation.html. Cluster File Systems Inc, Aug
Brinkmann A, Salzwedel K, Scheideler C (2000) Efficient, distributed data placement strategies for storage area networks. In: Proc of SPAA, 2000
Brinkmann A, Salzwedel K, Scheideler C (2002) Compact, adaptive placement schemes for non-uniform distribution requirements. In: Proc of SPAA, 2002
Brinkmann A, Effert S, Meyer auf der Heide F, Scheideler C (2007) Dynamic and redundant data placement. In: Proc of ICDCS, 2007
Dabek F, Kaashoek MF, Karger D et al (2001) Wide-area cooperative storage with CFS. In: Proc of the 18th ACM symposium on operating system principles, 2001
Frieze A, Jerrum M (1997) Improved Approximation Algorithms for MAX k-CUT and MAX BISECTION. Algorithmica, 1997
Ghemawat S, Gobioff H, Leung S-T (2003) The Google File System. In: Proc of the 19th ACM symposium on operating system principles, 2003
Gobioff H, Gibson G, Tygar D (1997) Security for network-attached storage devices. Tech Rep TRCMU-CS-97-185
Honicky RJ, Miller EL (2003) A fast algorithm for online placement and reorganization of replicated data. In: Proc of IPDPS, 2003
Honicky RJ, Miller EL (2004) Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution. In: Proc of IPDPS, 2004
Karger D, Lehman E, Leighton T, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: Proc of STOC, 1997
Nagle D, Serenyi D, Matthews A (2004) The Panasas ActiveScale storage cluster—delivering scalable high bandwidth storage. In: Proc of SC, 2004
Rowstron A, Druschel P (2001) Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In: Proc of the 18th ACM symp on operating systems principles, 2001
Schindelhauer C, Schomaker G (2005) Weighted distributed hash tables. In: Proc of SPAA, 2005
Schmuck F, Haskin R (2002) GPFS: a shared-disk file system for large computing clusters. In: Proc of FAST, 2002
van Renesse R (2004) Efficient reliable Internet storage. In: Workshop on dependable distributed data management, 2004
Weil SA, Brandt SA, Miller EL, Maltzahn C (2006) CRUSH: controlled, scalable and decentralized placement of replicated data. In: Proc of SC, 2006
Weil S, Leung A, Brandt SA, Maltzahn C (2007) RADOS: a fast, scalable, and reliable storage service for petabyte-scale storage clusters. In: Proc of the ACM petascale data storage workshop, 2007
Welch B, Unangst M, Abbasi Z, Gibson G (2008) Scalable performance of the Panasas parallel file system. In: Proc of FAST, USENIX, 2008
Wu C, R Burns (2003) Handling heterogeneity in shared-disk file systems. In: Proc of SC, 2003
Zhong L (2007) Efficient, balanced data placement algorithm in scalable storage clusters. J Commun Comput USA 4(7)

Download references

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, P.R. China
Nong Xiao, Tao Chen & Fang Liu

Authors

Nong Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nong Xiao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, N., Chen, T. & Liu, F. RSEDP: an effective hybrid data placement algorithm for large-scale storage systems. J Supercomput 55, 103–122 (2011). https://doi.org/10.1007/s11227-009-0357-7

Download citation

Published: 19 November 2009
Issue Date: January 2011
DOI: https://doi.org/10.1007/s11227-009-0357-7

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RSEDP: an effective hybrid data placement algorithm for large-scale storage systems

Abstract

Access this article

Similar content being viewed by others

Novel Data Placement Algorithm for Distributed Storage System Based on Fault-Tolerant Domain

RS-Pooling: an adaptive data distribution strategy for fault-tolerant and large-scale storage systems

Dual-Scheme Block Management to Trade Off Storage Overhead, Performance and Reliability

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

RSEDP: an effective hybrid data placement algorithm for large-scale storage systems

Abstract

Access this article

Similar content being viewed by others

Novel Data Placement Algorithm for Distributed Storage System Based on Fault-Tolerant Domain

RS-Pooling: an adaptive data distribution strategy for fault-tolerant and large-scale storage systems

Dual-Scheme Block Management to Trade Off Storage Overhead, Performance and Reliability

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation