Abstract:
Erasure codes (EC) have become a typical technology for distributed storage systems in place of data replication, providing similar data availability but lower storage co...Show MoreMetadata
Abstract:
Erasure codes (EC) have become a typical technology for distributed storage systems in place of data replication, providing similar data availability but lower storage cost. However, a great number of data computations and migrations during the EC recovery process bring high I/O and network latency penalties. Although several EC recovery methods have been designed to compromise the recovery penalty with high parallelism, the performance of these schemes was usually bounded by the straggler problems due to the various (I/O) performance among different nodes in the storage system. Moreover, the variation of the access popularity from the upper layer application causes the dynamic load fluctuation and asymmetry upon different nodes, which makes the scheduling more difficult during the recovery. To address the above problem, we propose a dynamic load-balanced scheduling algorithm for straggler recovery called EC-Scheduler. EC-Scheduler adjusts the recovery schedule dynamically with the awareness of continuous load fluctuation on the nodes, guaranteeing high parallelism and load balance ability simultaneously. To demonstrate the effectiveness of EC-Scheduler, we conduct several experiments in a cluster. The results show that, compared to typical recovery schemes such as Fast-PR and EC-Store, EC-Scheduler could achieve a 1.3X speed-up in the recovery process and 10X improvement in recovery load imbalance factor.
Date of Conference: 25-28 June 2021
Date Added to IEEE Xplore: 26 August 2021
ISBN Information:
Print on Demand(PoD) ISSN: 1548-615X