Skip to main content
Log in

HeMatch: A redundancy layout placement scheme for erasure-coded storages in practical heterogeneous failure patterns

HeMatch: 基于实际异构失效模型的纠删码存储系统冗余数据放置策略

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Erasure codes are widely used in storage systems for providing data reliability due to the advantage of high storage efficiency, while the access efficiency becomes the main shortcoming because of the extra data retrieve and decoding for accessing unavailable data. Most existing work designs erasure codes based on the ideal failure pattern where all storage nodes fail at the same rate. However in practice, the physical storage nodes fail at different rates due to the heterogeneous hardware, topologies and application behaviors. In this paper, we consider the heterogeneous failure pattern and analyze how the failure pattern impacts the overall access efficiency and reliability of erasure-coded storage systems. We propose HeMatch, a redundancy layout placement scheme in practical heterogeneous failure pattern for erasure-coded storage access efficiency. Specifically, we first study how the heterogeneous failure pattern impacts the access efficiency and propose a general model based on the Tanner graph to evaluate and predict the access efficiency in specific failure pattern and redundancy layout. Then, we propose the redundancy layout placement scheme, which matches the redundancy layout with the physical storage nodes in practical heterogeneous failure patterns based on the evaluation and prediction from our model. The experimental results demonstrate that the model we propose accurately evaluates the access efficiency, and HeMatch saves unavailable data access cost by up to 20% and improves the system reliability as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Borthakur D. The hadoop distributed file system: Architecture and design. Hadoop Project Website, 11, 21. 2007

    Google Scholar 

  2. Huang C, Simitci H, Xu, Y, et al. Erasure coding in Windows Azure Storage. In: Proceedings of the 2012 USENIX Annual Technical Conference, Boston, 2012. 15–26

    Google Scholar 

  3. Dimakis A G, Ramchandran K, Wu Y, et al. A survey on network codes for distributed storage. Proc IEEE, 2011, 99: 476–489

    Article  Google Scholar 

  4. Plank J S, Xu L. Optimizing Cauchy Reed-Solomon codes for fault-tolerant network storage applications. In: Proceedings of the Fifth IEEE International Symposium on Network Computing and Applications, Cambridge, 2006. 173–180

    Chapter  Google Scholar 

  5. Duminuco A, Biersack E. Hierarchical codes: How to make erasure codes attractive for peer-to-peer storage systems. In: Proceedings of the Eighth International Conference on Peer-to-Peer Computing, Aachen, 2008. 89–98

    Google Scholar 

  6. Huang C, Chen M, Li J. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. In: Proceedings of the Sixth IEEE International Symposium on Network Computing and Applications, Cambridge, 2007. 79–86

    Chapter  Google Scholar 

  7. Huang C, Chen M, Li J. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Trans Stor, 2013, 9: 3

    Google Scholar 

  8. Talbot D. A smarter algorithm could cut energy use in data centers by 35 percent. MIT Technical Report, 2013

    Google Scholar 

  9. Benson T, Akella A, Maltz A. Network traffic characteristics of data centers in the wild. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, Melbourne, 2010. 267–280

    Google Scholar 

  10. Costello D, Lin S. Error Control Coding. Pearson Higher Education. 2004

    Google Scholar 

  11. Dimakis A G, Godfrey P B, Wu Y, et al. Network coding for distributed storage systems. IEEE Trans Inf Theor, 2010, 56: 4539–4551

    Article  Google Scholar 

  12. Rashmi K V, Shah N B, Kumar P V, et al. Explicit construction of optimal exact regenerating codes for distributed storage. In: Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing, Chicago, 2009. 1243–1249

    Google Scholar 

  13. Wu Y, Dimakis A G. Reducing repair traffic for erasure coding-based storage via interference alignment. In: Proceedings of the 2009 IEEE International Symposium on Information Theory, Seoul, 2009. 2276–2280

    Chapter  Google Scholar 

  14. Hu Y, Chen H C, Lee P P, et al. NCCloud: Applying network coding for the storage repair in a cloud-of-clouds. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, San Jose, 2012. 21

    Google Scholar 

  15. Khan O, Burns R C, Plank J S, et al. Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, San Jose, 2012. 20

    Google Scholar 

  16. Li J, Wang X, Li B. Cooperative pipelined regeneration in distributed storage systems. In: Proceedings of the 2013 IEEE International Conference on Computer Communications, Turin, 2013. 2346–2354

    Google Scholar 

  17. Li J, Wang X, Li B. Pipelined regeneration with regenerating codes for distributed storage systems. In: Proceedings of the 2011 International Symposium on Network Coding, Beijing, 2011. 1–6

    Google Scholar 

  18. Dholakia A, Eleftheriou E, Hu X Y, et al. A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors. ACM Trans Stor, 2008, 4: 1

    Article  Google Scholar 

  19. Plank J S, Blaum M, Hafner J L. SD codes: Erasure codes designed for how storage systems really fail. In: Proceedings of the 11th USENIX Conference on File and Storage Technologies, San Jose, 2013. 95–104

    Google Scholar 

  20. Li M, Lee P P. STAIR codes: A general family of erasure codes for tolerating device and sector failures in practical storage systems. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies, Santa Clara, 2014. 147–162

    Google Scholar 

  21. Zhang J, Liao X K, Li S S, et al. Aggrecode: Constructing route intersection for data reconstruction in erasure coded storage. In: Proceedings of the 2014 IEEE International Conference on Computer Communications, Toronto, 2014

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jing Zhang or ShanShan Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Li, S., Liao, X. et al. HeMatch: A redundancy layout placement scheme for erasure-coded storages in practical heterogeneous failure patterns. Sci. China Inf. Sci. 58, 1–11 (2015). https://doi.org/10.1007/s11432-014-5276-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-014-5276-4

Keywords

关键词

Navigation