Abstract
To ensure data availability and save storage space, storage systems usually save data across multiple storage nodes (or servers) using erasure codes. Storage systems need to reconstruct the complete data to respond to reading requests in the case of the loss of some data blocks when node failure occurs. However, a degraded read in erasure code-based storage systems does not fully utilize node resources and ignores the node’s topology. In this paper, we propose a real-time performance evaluation model for storage nodes to evaluate the performance of each node combining a metrics choice and an analytic hierarchy process. We also design a cost evaluation method to calculate the transmission cost by considering the node’s topology. By combining the node evaluation method and a distance calculation, we propose an adaptive degraded read optimization strategy, NADE. We further implement the node selection method NADE in Ceph. The evaluation results show the efficiency of the proposed method.
Similar content being viewed by others
References
Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big data and its technical challenges. Commun ACM 577:86–94
Turner V, Reinsel D, Gantz JF, Minton S (2014) The digital universe of opportunities: rich data and increasing value of the internet of things. IDC Analyze the Future,16
Sanjay G, Gobioff H, Leung S (2003) The Google file system. ACM SIGOPS Oper Syst Rev 375:29–43
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. MSST, pp 1–10
Decandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Oper Syst Rev 416:205–220
Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 442:35–40
HDFS-RAID wiki (2018) http://wiki.apache.org/hadoop/HDFS-RAID. Accessed 12 June 2018
Ceph Erasure Code (2018) http://docs.ceph.com/docs/master/rados/operations/erasure-code/. Accessed 12 June 2018
LUSEP, ANDGREENAN (2014) Swift object storage: adding erasure codes
Sheepdog Erasure Code (2018) https://github.com/sheepdog/sheepdog/wiki/Erasure-Code-Support. Accessed 12 June 2018
Reed IS, Solomon G (1960) Polynomial codes over certain finite fields. J Soc Ind Appl Math 82:300–304
Xiang L, Xu Y, Lui JCS, Chang Q (2010) Optimal recovery of single disk failure in RDP code storage systems. ACM SIGMETRICS Perform Eval Rev 381:119–130
Zhu Y, Lin J, Lee PPC, Xu Y (2015) Boosting degraded reads in heterogeneous erasure-coded storage systems. IEEE Trans Comput 648:2145–2157
Shen Z, Shu J, Lee PPC (2016) Reconsidering single failure recovery in clustered file systems. In: IEEE/IFIP International Conference on Dependable Systems and Networks, pp 323–334
Weil SA, Brandt SA, Miller EL, Long DD, Maltzahn C (2006) Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation. USENIX Association, pp 307–320
Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S (2012) Erasure coding in windows azure storage. In: USENIX Conference on Technical Conference, pp 2–2
Miyamae T, Nakao T, Shiozawa K (2014) Erasure code with shingled local parity groups for efficient recovery from multiple disk failures. In: USENIX Conference on Hot Topics in System Dependability, pp 5–5
Rashmi KV, Shah NB, Kumar PV (2011) Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Trans Inf Theory 578:5227–5239
Khan O, Burns R, Plank J, Pierce W, Huang C (2012) Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads. In: USENIX Conference on File and Storage Technologies, pp 20–20
Zhu Y, Lee PPC, Xu Y, Hu Y, Xiang L (2014) On the speedup of recovery in large-scale erasure-coded storage systems. IEEE Trans Parallel Distrib Syst 257:1830–1840
Shen Z, Lee PPC, Shu J, Guo W (2017) Cross-rack-aware single failure recovery for clustered file systems. IEEE Trans Depend Secure Comput. https://doi.org/10.1109/TDSC.2017.2774299
Zhang J, Liao X, Li S, Hua Y (2014) Aggrecode: constructing route intersection for data reconstruction in erasure coded storage. In: INFOCOM, 2014 Proceedings IEEE, pp 2139–2147
Zhang H, Li H, Li SY (2017) Repair tree: fast repair for single failure in erasure-coded distributed storage systems. IEEE Trans Parallel Distrib Syst 28(6):1728–1739
Mitra S, Panta R, Ra MR, Bagchi S (2016) Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage. In: Eleventh European Conference on Computer Systems, pp 1–16
Li P, Jin X, Stones RJ, Wang G, Li Z, Liu X, Ren M (2017) Parallelizing degraded read for erasure coded cloud storage systems using collective communications. In: Trustcom/BigDatase/ISPA
Li R, Li X, Lee PPC, Huang Q (2017) Repair pipelining for erasure-coded storage. In: USENIX Technical Conference
Ernvall T, Rouayheb SE, Hollanti C, Poor HV (2013) Capacity and security of heterogeneous distributed storage systems. IEEE J Sel Areas Commun 3112:2701–2709
Li J, Yang S, Wang X, Li B (2010) Tree-structured data regeneration in distributed storage systems with regenerating codes. In: Conference on Information Communications, pp 2892–2900
Luo H, Huang J, Cao Q, Xie C (2014) LaRS: a load-aware recovery scheme for heterogeneous erasure-coded storage clusters. In: IEEE International Conference on Networking, Architecture, and Storage, pp 168–175
Xie P, Huang J, Qin X, Xie C (2017) SmartRec: fast recovery from single failures in heterogeneous RAID-coded storage systems. Comput J 616:896–911
Noel RR, Lama P (2017) Taming performance hotspots in cloud storage with dynamic load redistribution. In: IEEE International Conference on Cloud Computing, pp 42–49
Gudu D, Hardt M, Streit A (2014) Evaluating the performance and scalability of the Ceph distributed storage system. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, pp 177–182
Acknowledgements
This work was supported by the National Key Research and Development Plan of China under Grant 2016YFB1000303.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, X., Cai, Y., Liu, Y. et al. NADE: nodes performance awareness and accurate distance evaluation for degraded read in heterogeneous distributed erasure code-based storage. J Supercomput 76, 4946–4975 (2020). https://doi.org/10.1007/s11227-019-02879-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-02879-6