Abstract:
This paper addresses an issue of erasure-coded data archival, where (k + r; k) erasure codes are employed to archive rarely accessed replicas. The traditional synchronous...Show MoreMetadata
Abstract:
This paper addresses an issue of erasure-coded data archival, where (k + r; k) erasure codes are employed to archive rarely accessed replicas. The traditional synchronous encodingprocess neither leverages the existence of replicas, nor handles encoding operations in a decentralized manner. To overcome these drawbacks, we exploit pipelined encoding processes to boost the data archival performance on storage clusters. First, we propose two data layouts called [D + P]cd and [3X]cd by applying a chained-declustering mechanism to both Mirrored RAID-5 and triplication redundancy groups. Second, in light of the [D + P]cd and [3X]cd layouts, we design two archiving schemes named DP and 3X, which exhibit the following three salient features: (i) exploiting data locality-two or three local blocks are read by each involved node for encoding; (ii) decentralized computation load-encoding operations are distributed among k nodes; and (iii) parallel archival processing-two or three encoding pipelines are simultaneously deployed to generate parity blocks. We implement both the DPand 3X schemes and three existing solutions (i.e., SynE, DE, and RapidRAID) in a real-world storage cluster. Experimental results show that our archival schemes outperform the other three solutions in terms of archiving time by a factor of at least 3.41 in a nine-node storage cluster. The experiments strongly indicate that the performance bottleneck of SynE lies in its block-receiving stage; it is disk I/O rather than network traffic that dominates archiving time for both the DE and RapidRAID schemes.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 26, Issue: 11, 01 November 2015)