NADE: nodes performance awareness and accurate distance evaluation for degraded read in heterogeneous distributed erasure code-based storage

Zhang, Xingjun; Cai, Yi; Liu, Yunfei; Xu, Zhiwei; Dong, Xiaoshe

doi:10.1007/s11227-019-02879-6

NADE: nodes performance awareness and accurate distance evaluation for degraded read in heterogeneous distributed erasure code-based storage

Published: 09 May 2019

Volume 76, pages 4946–4975, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Xingjun Zhang¹,
Yi Cai¹,
Yunfei Liu¹,
Zhiwei Xu¹ &
…
Xiaoshe Dong¹

362 Accesses
5 Citations
Explore all metrics

Abstract

To ensure data availability and save storage space, storage systems usually save data across multiple storage nodes (or servers) using erasure codes. Storage systems need to reconstruct the complete data to respond to reading requests in the case of the loss of some data blocks when node failure occurs. However, a degraded read in erasure code-based storage systems does not fully utilize node resources and ignores the node’s topology. In this paper, we propose a real-time performance evaluation model for storage nodes to evaluate the performance of each node combining a metrics choice and an analytic hierarchy process. We also design a cost evaluation method to calculate the transmission cost by considering the node’s topology. By combining the node evaluation method and a distance calculation, we propose an adaptive degraded read optimization strategy, NADE. We further implement the node selection method NADE in Ceph. The evaluation results show the efficiency of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems

Article 01 June 2022

Xingjun Zhang, Ningjing Liang, … Yang Li

A study of the performance of novel storage-centric repairable codes

Article 29 July 2015

Anwitaman Datta, Lluis Pamies-Juarez & Frédérique Oggier

H-V: An Improved Coding Layout Based on Erasure Coded Storage System

References

Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big data and its technical challenges. Commun ACM 577:86–94
Article Google Scholar
Turner V, Reinsel D, Gantz JF, Minton S (2014) The digital universe of opportunities: rich data and increasing value of the internet of things. IDC Analyze the Future,16
Sanjay G, Gobioff H, Leung S (2003) The Google file system. ACM SIGOPS Oper Syst Rev 375:29–43
Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. MSST, pp 1–10
Decandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Oper Syst Rev 416:205–220
Article Google Scholar
Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 442:35–40
Article Google Scholar
HDFS-RAID wiki (2018) http://wiki.apache.org/hadoop/HDFS-RAID. Accessed 12 June 2018
Ceph Erasure Code (2018) http://docs.ceph.com/docs/master/rados/operations/erasure-code/. Accessed 12 June 2018
LUSEP, ANDGREENAN (2014) Swift object storage: adding erasure codes
Sheepdog Erasure Code (2018) https://github.com/sheepdog/sheepdog/wiki/Erasure-Code-Support. Accessed 12 June 2018
Reed IS, Solomon G (1960) Polynomial codes over certain finite fields. J Soc Ind Appl Math 82:300–304
Article MathSciNet Google Scholar
Xiang L, Xu Y, Lui JCS, Chang Q (2010) Optimal recovery of single disk failure in RDP code storage systems. ACM SIGMETRICS Perform Eval Rev 381:119–130
Google Scholar
Zhu Y, Lin J, Lee PPC, Xu Y (2015) Boosting degraded reads in heterogeneous erasure-coded storage systems. IEEE Trans Comput 648:2145–2157
Article MathSciNet Google Scholar
Shen Z, Shu J, Lee PPC (2016) Reconsidering single failure recovery in clustered file systems. In: IEEE/IFIP International Conference on Dependable Systems and Networks, pp 323–334
Weil SA, Brandt SA, Miller EL, Long DD, Maltzahn C (2006) Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation. USENIX Association, pp 307–320
Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S (2012) Erasure coding in windows azure storage. In: USENIX Conference on Technical Conference, pp 2–2
Miyamae T, Nakao T, Shiozawa K (2014) Erasure code with shingled local parity groups for efficient recovery from multiple disk failures. In: USENIX Conference on Hot Topics in System Dependability, pp 5–5
Rashmi KV, Shah NB, Kumar PV (2011) Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Trans Inf Theory 578:5227–5239
Article MathSciNet Google Scholar
Khan O, Burns R, Plank J, Pierce W, Huang C (2012) Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads. In: USENIX Conference on File and Storage Technologies, pp 20–20
Zhu Y, Lee PPC, Xu Y, Hu Y, Xiang L (2014) On the speedup of recovery in large-scale erasure-coded storage systems. IEEE Trans Parallel Distrib Syst 257:1830–1840
Article Google Scholar
Shen Z, Lee PPC, Shu J, Guo W (2017) Cross-rack-aware single failure recovery for clustered file systems. IEEE Trans Depend Secure Comput. https://doi.org/10.1109/TDSC.2017.2774299
Article Google Scholar
Zhang J, Liao X, Li S, Hua Y (2014) Aggrecode: constructing route intersection for data reconstruction in erasure coded storage. In: INFOCOM, 2014 Proceedings IEEE, pp 2139–2147
Zhang H, Li H, Li SY (2017) Repair tree: fast repair for single failure in erasure-coded distributed storage systems. IEEE Trans Parallel Distrib Syst 28(6):1728–1739
Article Google Scholar
Mitra S, Panta R, Ra MR, Bagchi S (2016) Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage. In: Eleventh European Conference on Computer Systems, pp 1–16
Li P, Jin X, Stones RJ, Wang G, Li Z, Liu X, Ren M (2017) Parallelizing degraded read for erasure coded cloud storage systems using collective communications. In: Trustcom/BigDatase/ISPA
Li R, Li X, Lee PPC, Huang Q (2017) Repair pipelining for erasure-coded storage. In: USENIX Technical Conference
Ernvall T, Rouayheb SE, Hollanti C, Poor HV (2013) Capacity and security of heterogeneous distributed storage systems. IEEE J Sel Areas Commun 3112:2701–2709
Article Google Scholar
Li J, Yang S, Wang X, Li B (2010) Tree-structured data regeneration in distributed storage systems with regenerating codes. In: Conference on Information Communications, pp 2892–2900
Luo H, Huang J, Cao Q, Xie C (2014) LaRS: a load-aware recovery scheme for heterogeneous erasure-coded storage clusters. In: IEEE International Conference on Networking, Architecture, and Storage, pp 168–175
Xie P, Huang J, Qin X, Xie C (2017) SmartRec: fast recovery from single failures in heterogeneous RAID-coded storage systems. Comput J 616:896–911
Google Scholar
Noel RR, Lama P (2017) Taming performance hotspots in cloud storage with dynamic load redistribution. In: IEEE International Conference on Cloud Computing, pp 42–49
Gudu D, Hardt M, Streit A (2014) Evaluating the performance and scalability of the Ceph distributed storage system. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, pp 177–182

Download references

Acknowledgements

This work was supported by the National Key Research and Development Plan of China under Grant 2016YFB1000303.

Author information

Authors and Affiliations

Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, 28# XianNing West Road, Xi’an, 710049, China
Xingjun Zhang, Yi Cai, Yunfei Liu, Zhiwei Xu & Xiaoshe Dong

Authors

Xingjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Cai
View author publications
You can also search for this author in PubMed Google Scholar
Yunfei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoshe Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xingjun Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Cai, Y., Liu, Y. et al. NADE: nodes performance awareness and accurate distance evaluation for degraded read in heterogeneous distributed erasure code-based storage. J Supercomput 76, 4946–4975 (2020). https://doi.org/10.1007/s11227-019-02879-6

Download citation

Published: 09 May 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11227-019-02879-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NADE: nodes performance awareness and accurate distance evaluation for degraded read in heterogeneous distributed erasure code-based storage

Abstract

Access this article

Similar content being viewed by others

SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems

A study of the performance of novel storage-centric repairable codes

H-V: An Improved Coding Layout Based on Erasure Coded Storage System

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

NADE: nodes performance awareness and accurate distance evaluation for degraded read in heterogeneous distributed erasure code-based storage

Abstract

Access this article

Similar content being viewed by others

SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems

A study of the performance of novel storage-centric repairable codes

H-V: An Improved Coding Layout Based on Erasure Coded Storage System

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation