Abstract
Erasure codes are widely used in storage systems for providing data reliability due to the advantage of high storage efficiency, while the access efficiency becomes the main shortcoming because of the extra data retrieve and decoding for accessing unavailable data. Most existing work designs erasure codes based on the ideal failure pattern where all storage nodes fail at the same rate. However in practice, the physical storage nodes fail at different rates due to the heterogeneous hardware, topologies and application behaviors. In this paper, we consider the heterogeneous failure pattern and analyze how the failure pattern impacts the overall access efficiency and reliability of erasure-coded storage systems. We propose HeMatch, a redundancy layout placement scheme in practical heterogeneous failure pattern for erasure-coded storage access efficiency. Specifically, we first study how the heterogeneous failure pattern impacts the access efficiency and propose a general model based on the Tanner graph to evaluate and predict the access efficiency in specific failure pattern and redundancy layout. Then, we propose the redundancy layout placement scheme, which matches the redundancy layout with the physical storage nodes in practical heterogeneous failure patterns based on the evaluation and prediction from our model. The experimental results demonstrate that the model we propose accurately evaluates the access efficiency, and HeMatch saves unavailable data access cost by up to 20% and improves the system reliability as well.
Similar content being viewed by others
References
Borthakur D. The hadoop distributed file system: Architecture and design. Hadoop Project Website, 11, 21. 2007
Huang C, Simitci H, Xu, Y, et al. Erasure coding in Windows Azure Storage. In: Proceedings of the 2012 USENIX Annual Technical Conference, Boston, 2012. 15–26
Dimakis A G, Ramchandran K, Wu Y, et al. A survey on network codes for distributed storage. Proc IEEE, 2011, 99: 476–489
Plank J S, Xu L. Optimizing Cauchy Reed-Solomon codes for fault-tolerant network storage applications. In: Proceedings of the Fifth IEEE International Symposium on Network Computing and Applications, Cambridge, 2006. 173–180
Duminuco A, Biersack E. Hierarchical codes: How to make erasure codes attractive for peer-to-peer storage systems. In: Proceedings of the Eighth International Conference on Peer-to-Peer Computing, Aachen, 2008. 89–98
Huang C, Chen M, Li J. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. In: Proceedings of the Sixth IEEE International Symposium on Network Computing and Applications, Cambridge, 2007. 79–86
Huang C, Chen M, Li J. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Trans Stor, 2013, 9: 3
Talbot D. A smarter algorithm could cut energy use in data centers by 35 percent. MIT Technical Report, 2013
Benson T, Akella A, Maltz A. Network traffic characteristics of data centers in the wild. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, Melbourne, 2010. 267–280
Costello D, Lin S. Error Control Coding. Pearson Higher Education. 2004
Dimakis A G, Godfrey P B, Wu Y, et al. Network coding for distributed storage systems. IEEE Trans Inf Theor, 2010, 56: 4539–4551
Rashmi K V, Shah N B, Kumar P V, et al. Explicit construction of optimal exact regenerating codes for distributed storage. In: Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing, Chicago, 2009. 1243–1249
Wu Y, Dimakis A G. Reducing repair traffic for erasure coding-based storage via interference alignment. In: Proceedings of the 2009 IEEE International Symposium on Information Theory, Seoul, 2009. 2276–2280
Hu Y, Chen H C, Lee P P, et al. NCCloud: Applying network coding for the storage repair in a cloud-of-clouds. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, San Jose, 2012. 21
Khan O, Burns R C, Plank J S, et al. Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, San Jose, 2012. 20
Li J, Wang X, Li B. Cooperative pipelined regeneration in distributed storage systems. In: Proceedings of the 2013 IEEE International Conference on Computer Communications, Turin, 2013. 2346–2354
Li J, Wang X, Li B. Pipelined regeneration with regenerating codes for distributed storage systems. In: Proceedings of the 2011 International Symposium on Network Coding, Beijing, 2011. 1–6
Dholakia A, Eleftheriou E, Hu X Y, et al. A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors. ACM Trans Stor, 2008, 4: 1
Plank J S, Blaum M, Hafner J L. SD codes: Erasure codes designed for how storage systems really fail. In: Proceedings of the 11th USENIX Conference on File and Storage Technologies, San Jose, 2013. 95–104
Li M, Lee P P. STAIR codes: A general family of erasure codes for tolerating device and sector failures in practical storage systems. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies, Santa Clara, 2014. 147–162
Zhang J, Liao X K, Li S S, et al. Aggrecode: Constructing route intersection for data reconstruction in erasure coded storage. In: Proceedings of the 2014 IEEE International Conference on Computer Communications, Toronto, 2014
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Zhang, J., Li, S., Liao, X. et al. HeMatch: A redundancy layout placement scheme for erasure-coded storages in practical heterogeneous failure patterns. Sci. China Inf. Sci. 58, 1–11 (2015). https://doi.org/10.1007/s11432-014-5276-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-014-5276-4
Keywords
- network storage
- erasure-coded storage
- access efficiency
- redundancy placement
- heterogeneous failure pattern