Abstract:
Modern data centers employ erasure coding to protect data storage against failures. Given the hierarchical nature of data centers, characterizing the effects of erasure c...Show MoreMetadata
Abstract:
Modern data centers employ erasure coding to protect data storage against failures. Given the hierarchical nature of data centers, characterizing the effects of erasure coding and redundancy placement on the reliability of erasure-coded data centers is critical yet unexplored. This paper presents a discrete-event simulator called SimEDC, which enables us to conduct a comprehensive simulation analysis of reliability on erasure-coded data centers. SimEDC reports reliability metrics of an erasure-coded data center based on the configurable inputs of the data center topology, erasure codes, redundancy placement, and failure/repair patterns of different subsystems obtained from statistical models or production traces. It can further accelerate the simulation analysis via importance sampling. Our simulation analysis based on SimEDC shows that placing erasure-coded data in fewer racks generally improves reliability by reducing cross-rack repair traffic, even though it sacrifices rack-level fault tolerance in the face of correlated failures.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 30, Issue: 12, 01 December 2019)