Abstract
The failure of a single unreliable commodity components is very common in large-scale distributed storage systems. In order to ensure the reliability of data in large-scale distributed storage systems, a lot of studies have emerged one after another. Among them, Erasure Codes are widely used in actual storage systems, such as Hadoop Distributed File System (HDFS), to provide high fault tolerance with lower storage overhead. However, usually the recovery of erasure-coded storage system when encountering a node failure will result in severe cross-node and cross-rack bandwidth loss, which affects the efficiency of failure recovery and wastes additional resources. In this paper, we improve the erasure code storage strategy in Hadoop 3.x, propose H\(^{RS(n,k)}\) - V\(^{RS(n',k')}\) abbreviated as H-V, and add RS parity check inside the data nodes, effectively reduce cross-node and cross-rack data transmission during recovery, reduce the occupation of cross-rack bandwidth, and improve recovery efficiency. Theoretical analysis shows that compared with traditional RS erasure code storage, H-V can reduce the cross-node and cross-rack bandwidth of RS by at least 25% and respectively during data recovery. 62.5%; compared with D\(^3\), H-V reduces the storage redundancy by up to 19.7% while reducing the cross-node and cross-rack bandwidth of D\(^3\) by 25% and 12.5% during data recovery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. ACM SIGOPS Oper. Syst. Rev. 37(5), 29ā43 (2003)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies (2010)
Facebook: HDFS-RAID (2011). https://wiki.apache.org/hadoop/HDFS-RAID
Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300ā304 (1960)
Benson, T., Akella, A., Maltz, D.A.: Network traffic characteristics of data centers in the wild. In: Proceedings of 10th ACM SIGCOMM Conference on Internet Measurement, pp. 267ā280 (2010)
Li, R., Hu, Y., Lee, P.P.: Enabling efficient and reliable transition from replication to erasure coding for clustered file systems. IEEE Trans. Parallel Distrib. Syst. 28(9), 2500ā2513 (2017)
Liu, S., Duan, D.: An improved method for HDFS replica recovery: based on SVM algorithm. In: Proceedings of the 2020 4th International Conference on Cloud and Big Data Computing (2020)
Zhao, W., Cui, X.: A fast adaptive replica recovery algorithm based on access frequency and environment awareness. In: Proceedings of the 2020 4th International Conference on Cloud and Big Data Computing (2020)
Wu, S., Zhu, W., Mao, B., Li, K.C.: PP: popularity-based proactive data recovery for HDFS-RAID systems. Future Gener. Comput. Syst. 86(SEP), 1146ā1153 (2017)
Tai, Z., et al.: STORE: data recovery with approximate minimum network bandwidth and disk I/O in distributed storage systems. In: IEEE International Conference on Big Data. IEEE (2015)
Li, R., Jian, L., Lee, P.: CORE: augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems. IEEE (2013)
Xie, X., et al.: AZ-code: an efficient availability zone level erasure code to provide high fault tolerance in cloud storage systems. In: 2019 35th Symposium on Mass Storage Systems and Technologies (MSST) (2019)
Xia, M., et al.: A tale of two erasure codes in HDFS. In: Usenix Conference on File & Storage Technologies USENIX Association (2015)
Xu, L., et al.: Deterministic data distribution for efficient recovery in erasure-coded storage systems. IEEE Trans. Parallel Distrib. Syst. 31(10), 2248ā2262 (2020)
Caneleo, P., et al.: On improving recovery performance in erasure code based geo-diverse storage clusters. In: International Conference on the Design of Reliable Communication Networks. IEEE (2016)
Acknowledgements
The research was supported in part by the National Natural Science Foundation of China (Grant No. 61872043), Qin Xin Talents Cultivation Program, Beijing Information Science & Technology University (No. QXTCP B201904), State Key Laboratory of Computer Architecture (ICT, CAS) under Grant No. CARCHA202103, the key scientific and technological projects of Henan Province (Grant No. 202102210174), and the Key Scientific Research Projects of Henan Higher School (Grant No. 19A520043).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mu, T., Song, Y., Yang, M., Wang, B., Zhao, J. (2022). H-V: An Improved Coding Layout Based onĀ Erasure Coded Storage System. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-11217-1_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11216-4
Online ISBN: 978-3-031-11217-1
eBook Packages: Computer ScienceComputer Science (R0)