Abstract
The reliability and scalability of large-scale network storage systems are confronted with big challenges, which require designing a reliable, scalable, and efficient data placement algorithm. Previous techniques can only partially satisfy these requirements. In this work, we develop an effective hybrid approach, RSEDP, which combines reliable replication data placement (RRDP) with scalable and efficient data placement (SEDP) to achieve the requirements mentioned above. RRDP distributes replicated data over large-scale heterogeneous network storage systems in which the same replica is distributed to different devices and not inclined to consecutive devices, achieving high redundancy degree and failure resilience. SEDP assigns data evenly among devices according to their weight and scales well to the expansions or curtailments of the systems. In order to take the advantages of both RRDP and SEDP, RSEDP integrates them by categorizing data into hot and cold data based on their access frequency, placing hot data by RRDP, and distributing the remainder by SEDP. The theoretical analysis and the experimental study show that the combined RSEDP can increase redundancy degree and failure resilience, and has a good scalability and time efficiency with small memory overhead.
Similar content being viewed by others
References
Braam RJ (2004) The Lustre storage architecture. http://www.lustre.org/documentation.html. Cluster File Systems Inc, Aug
Brinkmann A, Salzwedel K, Scheideler C (2000) Efficient, distributed data placement strategies for storage area networks. In: Proc of SPAA, 2000
Brinkmann A, Salzwedel K, Scheideler C (2002) Compact, adaptive placement schemes for non-uniform distribution requirements. In: Proc of SPAA, 2002
Brinkmann A, Effert S, Meyer auf der Heide F, Scheideler C (2007) Dynamic and redundant data placement. In: Proc of ICDCS, 2007
Dabek F, Kaashoek MF, Karger D et al (2001) Wide-area cooperative storage with CFS. In: Proc of the 18th ACM symposium on operating system principles, 2001
Frieze A, Jerrum M (1997) Improved Approximation Algorithms for MAX k-CUT and MAX BISECTION. Algorithmica, 1997
Ghemawat S, Gobioff H, Leung S-T (2003) The Google File System. In: Proc of the 19th ACM symposium on operating system principles, 2003
Gobioff H, Gibson G, Tygar D (1997) Security for network-attached storage devices. Tech Rep TRCMU-CS-97-185
Honicky RJ, Miller EL (2003) A fast algorithm for online placement and reorganization of replicated data. In: Proc of IPDPS, 2003
Honicky RJ, Miller EL (2004) Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution. In: Proc of IPDPS, 2004
Karger D, Lehman E, Leighton T, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: Proc of STOC, 1997
Nagle D, Serenyi D, Matthews A (2004) The Panasas ActiveScale storage cluster—delivering scalable high bandwidth storage. In: Proc of SC, 2004
Rowstron A, Druschel P (2001) Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In: Proc of the 18th ACM symp on operating systems principles, 2001
Schindelhauer C, Schomaker G (2005) Weighted distributed hash tables. In: Proc of SPAA, 2005
Schmuck F, Haskin R (2002) GPFS: a shared-disk file system for large computing clusters. In: Proc of FAST, 2002
van Renesse R (2004) Efficient reliable Internet storage. In: Workshop on dependable distributed data management, 2004
Weil SA, Brandt SA, Miller EL, Maltzahn C (2006) CRUSH: controlled, scalable and decentralized placement of replicated data. In: Proc of SC, 2006
Weil S, Leung A, Brandt SA, Maltzahn C (2007) RADOS: a fast, scalable, and reliable storage service for petabyte-scale storage clusters. In: Proc of the ACM petascale data storage workshop, 2007
Welch B, Unangst M, Abbasi Z, Gibson G (2008) Scalable performance of the Panasas parallel file system. In: Proc of FAST, USENIX, 2008
Wu C, R Burns (2003) Handling heterogeneity in shared-disk file systems. In: Proc of SC, 2003
Zhong L (2007) Efficient, balanced data placement algorithm in scalable storage clusters. J Commun Comput USA 4(7)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xiao, N., Chen, T. & Liu, F. RSEDP: an effective hybrid data placement algorithm for large-scale storage systems. J Supercomput 55, 103–122 (2011). https://doi.org/10.1007/s11227-009-0357-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-009-0357-7