HeMatch: A redundancy layout placement scheme for erasure-coded storages in practical heterogeneous failure patterns

Zhang, Jing; Li, ShanShan; Liao, XiangKe; Peng, ShaoLiang; Liu, XiaoDong; Jia, ZhouYang

doi:10.1007/s11432-014-5276-4

HeMatch: A redundancy layout placement scheme for erasure-coded storages in practical heterogeneous failure patterns

HeMatch: 基于实际异构失效模型的纠删码存储系统冗余数据放置策略

Research Paper
Published: 08 April 2015

Volume 58, pages 1–11, (2015)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Jing Zhang¹,
ShanShan Li¹,
XiangKe Liao¹,
ShaoLiang Peng¹,
XiaoDong Liu¹ &
…
ZhouYang Jia¹

134 Accesses
4 Citations
Explore all metrics

Abstract

Erasure codes are widely used in storage systems for providing data reliability due to the advantage of high storage efficiency, while the access efficiency becomes the main shortcoming because of the extra data retrieve and decoding for accessing unavailable data. Most existing work designs erasure codes based on the ideal failure pattern where all storage nodes fail at the same rate. However in practice, the physical storage nodes fail at different rates due to the heterogeneous hardware, topologies and application behaviors. In this paper, we consider the heterogeneous failure pattern and analyze how the failure pattern impacts the overall access efficiency and reliability of erasure-coded storage systems. We propose HeMatch, a redundancy layout placement scheme in practical heterogeneous failure pattern for erasure-coded storage access efficiency. Specifically, we first study how the heterogeneous failure pattern impacts the access efficiency and propose a general model based on the Tanner graph to evaluate and predict the access efficiency in specific failure pattern and redundancy layout. Then, we propose the redundancy layout placement scheme, which matches the redundancy layout with the physical storage nodes in practical heterogeneous failure patterns based on the evaluation and prediction from our model. The experimental results demonstrate that the model we propose accurately evaluates the access efficiency, and HeMatch saves unavailable data access cost by up to 20% and improves the system reliability as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REDU: reducing redundancy and duplication for multi-failure recovery in erasure-coded storages

Article 10 March 2015

Sector Error-Oriented Durability-Aware Fast Repair in Erasure-Coded Cloud Storage Systems

SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems

Article 01 June 2022

References

Borthakur D. The hadoop distributed file system: Architecture and design. Hadoop Project Website, 11, 21. 2007
Google Scholar
Huang C, Simitci H, Xu, Y, et al. Erasure coding in Windows Azure Storage. In: Proceedings of the 2012 USENIX Annual Technical Conference, Boston, 2012. 15–26
Google Scholar
Dimakis A G, Ramchandran K, Wu Y, et al. A survey on network codes for distributed storage. Proc IEEE, 2011, 99: 476–489
Article Google Scholar
Plank J S, Xu L. Optimizing Cauchy Reed-Solomon codes for fault-tolerant network storage applications. In: Proceedings of the Fifth IEEE International Symposium on Network Computing and Applications, Cambridge, 2006. 173–180
Chapter Google Scholar
Duminuco A, Biersack E. Hierarchical codes: How to make erasure codes attractive for peer-to-peer storage systems. In: Proceedings of the Eighth International Conference on Peer-to-Peer Computing, Aachen, 2008. 89–98
Google Scholar
Huang C, Chen M, Li J. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. In: Proceedings of the Sixth IEEE International Symposium on Network Computing and Applications, Cambridge, 2007. 79–86
Chapter Google Scholar
Huang C, Chen M, Li J. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Trans Stor, 2013, 9: 3
Google Scholar
Talbot D. A smarter algorithm could cut energy use in data centers by 35 percent. MIT Technical Report, 2013
Google Scholar
Benson T, Akella A, Maltz A. Network traffic characteristics of data centers in the wild. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, Melbourne, 2010. 267–280
Google Scholar
Costello D, Lin S. Error Control Coding. Pearson Higher Education. 2004
Google Scholar
Dimakis A G, Godfrey P B, Wu Y, et al. Network coding for distributed storage systems. IEEE Trans Inf Theor, 2010, 56: 4539–4551
Article Google Scholar
Rashmi K V, Shah N B, Kumar P V, et al. Explicit construction of optimal exact regenerating codes for distributed storage. In: Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing, Chicago, 2009. 1243–1249
Google Scholar
Wu Y, Dimakis A G. Reducing repair traffic for erasure coding-based storage via interference alignment. In: Proceedings of the 2009 IEEE International Symposium on Information Theory, Seoul, 2009. 2276–2280
Chapter Google Scholar
Hu Y, Chen H C, Lee P P, et al. NCCloud: Applying network coding for the storage repair in a cloud-of-clouds. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, San Jose, 2012. 21
Google Scholar
Khan O, Burns R C, Plank J S, et al. Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, San Jose, 2012. 20
Google Scholar
Li J, Wang X, Li B. Cooperative pipelined regeneration in distributed storage systems. In: Proceedings of the 2013 IEEE International Conference on Computer Communications, Turin, 2013. 2346–2354
Google Scholar
Li J, Wang X, Li B. Pipelined regeneration with regenerating codes for distributed storage systems. In: Proceedings of the 2011 International Symposium on Network Coding, Beijing, 2011. 1–6
Google Scholar
Dholakia A, Eleftheriou E, Hu X Y, et al. A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors. ACM Trans Stor, 2008, 4: 1
Article Google Scholar
Plank J S, Blaum M, Hafner J L. SD codes: Erasure codes designed for how storage systems really fail. In: Proceedings of the 11th USENIX Conference on File and Storage Technologies, San Jose, 2013. 95–104
Google Scholar
Li M, Lee P P. STAIR codes: A general family of erasure codes for tolerating device and sector failures in practical storage systems. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies, Santa Clara, 2014. 147–162
Google Scholar
Zhang J, Liao X K, Li S S, et al. Aggrecode: Constructing route intersection for data reconstruction in erasure coded storage. In: Proceedings of the 2014 IEEE International Conference on Computer Communications, Toronto, 2014
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, National University of Defence Technology, Changsha, 410073, China
Jing Zhang, ShanShan Li, XiangKe Liao, ShaoLiang Peng, XiaoDong Liu & ZhouYang Jia

Authors

Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
ShanShan Li
View author publications
You can also search for this author in PubMed Google Scholar
XiangKe Liao
View author publications
You can also search for this author in PubMed Google Scholar
ShaoLiang Peng
View author publications
You can also search for this author in PubMed Google Scholar
XiaoDong Liu
View author publications
You can also search for this author in PubMed Google Scholar
ZhouYang Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jing Zhang or ShanShan Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Li, S., Liao, X. et al. HeMatch: A redundancy layout placement scheme for erasure-coded storages in practical heterogeneous failure patterns. Sci. China Inf. Sci. 58, 1–11 (2015). https://doi.org/10.1007/s11432-014-5276-4

Download citation

Received: 04 October 2014
Accepted: 17 December 2014
Published: 08 April 2015
Issue Date: June 2015
DOI: https://doi.org/10.1007/s11432-014-5276-4

Keywords

关键词

067101

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HeMatch: A redundancy layout placement scheme for erasure-coded storages in practical heterogeneous failure patterns

Abstract

Access this article

Similar content being viewed by others

REDU: reducing redundancy and duplication for multi-failure recovery in erasure-coded storages

Sector Error-Oriented Durability-Aware Fast Repair in Erasure-Coded Cloud Storage Systems

SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

关键词

Navigation

HeMatch: A redundancy layout placement scheme for erasure-coded storages in practical heterogeneous failure patterns

Abstract

Access this article

Similar content being viewed by others

REDU: reducing redundancy and duplication for multi-failure recovery in erasure-coded storages

Sector Error-Oriented Durability-Aware Fast Repair in Erasure-Coded Cloud Storage Systems

SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

关键词

Search

Navigation