Abstract:
To adapt to the increasing storage demands and varying storage redundancy requirements, practical distributed storage systems need to support storage scaling by relocatin...Show MoreMetadata
Abstract:
To adapt to the increasing storage demands and varying storage redundancy requirements, practical distributed storage systems need to support storage scaling by relocating currently stored data to different storage nodes. However, the scaling process inevitably transfers substantial data traffic over the network. Thus, minimizing the bandwidth cost of the scaling process is critical in distributed settings. In this paper, we show that optimal storage scaling is achievable in erasure-coded distributed storage based on network coding, by allowing storage nodes to send encoded data during scaling. We formally prove the information-theoretically minimum scaling bandwidth for both scale-out and scale-in cases. Based on our theoretical findings, we also build a distributed storage system prototype NCScale based on Hadoop Distributed File System, so as to realize network-coding-based scaling while preserving the necessary properties for practical deployment. Experiments on Amazon EC2 show that the scaling time can be reduced by up to 50% over the state-of-the-art.
Published in: IEEE/ACM Transactions on Networking ( Volume: 30, Issue: 1, February 2022)