Abstract
There is a variety of erasure-coded data placement schemes that make a great contribution to data repair. To repair data, the operator should replace the failed node with a new node first. However, almost all these schemes assume the node replacement process (NRP) is done quickly, which is not true. Generally, NRP includes failure detection and failure repair, which may take hours or even days. Long delay of replacement may cause the recovered data lost again due to the lack of durability. To improve data durability, we propose a novel scheme called Sector Error-Oriented Durability-Aware Fast Repair (SEDRepair), which carefully couples data migration and data reconstruction in parallel for data repair. We conduct mathematical analysis and compute the optimal repair in our model. The results show that, compared to the traditional erasure coding methods, SEDRepair saves the repair time by up to 60% in most cases and improves data durability while keeping minimal storage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blaum, M., Brady, J., Bruck, J., Menon, J.: Evenodd: an efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)
Blömer, J., Kalfane, M., Karp, R., Karpinski, M., Luby, M., Zuckerman, D.: An XOR-based erasure-resilient coding scheme (1995)
Chan, J.C., Ding, Q., Lee, P.P., Chan, H.H.: Parity logging with reserved space: towards efficient updates and recovery in erasure-coded clustered storage. In: 12th \(\{\)USENIX\(\}\) Conference on File and Storage Technologies (\(\{\)FAST\(\}\) 2014), pp. 163–176 (2014)
Emami, T.K.: Partial disk failures and improved storage resiliency, November 2011
Ford, D., et al.: Availability in globally distributed storage systems (2010)
Huang, C., Li, J., Chen, M.: On optimizing XOR-based codes for fault-tolerant storage applications. In: 2007 IEEE Information Theory Workshop, pp. 218–223. IEEE (2007)
Huang, C., et al.: Erasure coding in windows azure storage, p. 2 (2012)
Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57, 889–901 (2008)
Huang, P., et al.: Gray failure: the achilles’ heel of cloud-scale systems. In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems, pp. 150–155 (2017)
Khan, O., Burns, R., Plank, J.S., Pierce, W., Huang, C.: Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads, p. 20 (2012)
Muralidhar, S., et al.: F4: Facebook’s warm \(\{\)BLOB\(\}\) storage system. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 2014), pp. 383–398 (2014)
Nachiappan, R., Javadi, B., Calheiros, R.N., Matawie, K.M.: Cloud storage reliability for big data applications: a state of the art survey. J. Netw. Comput. Appl. 97, 35–47 (2017)
Ovsiannikov, M., Rus, S., Reeves, D., Sutter, P., Rao, S., Kelly, J.: The quantcast file system. Proc. VLDB Endow. 6(11), 1092–1101 (2013)
Plank, J.S.: The raid-6 liberation code. Int. J. High Perform. Comput. Appl. 23(3), 242–251 (2009)
Plank, J.S., Simmerman, S., Schuman, C.D.: Jerasure: a library in C/C++ facilitating erasure coding for storage applications-version 1.2. University of Tennessee, Technical report, CS-08-627, 23 (2008)
Plank, J.S., Xu, L.: Optimizing cauchy reed-solomon codes for fault-tolerant network storage applications. In: Fifth IEEE International Symposium on Network Computing and Applications (NCA 2006), pp. 173–180. IEEE (2006)
Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960)
Shen, J., Zhang, K., Gu, J., Zhou, Y., Wang, X.: Efficient scheduling for multi-block updates in erasure coding based storage systems. IEEE Trans. Comput. 67(4), 573–581 (2017)
Shen, Z., Lee, P.P.C.: Cross-rack-aware updates in erasure-coded data centers, p. 80 (2018)
Shen, Z., Li, X., Lee, P.P.C.: Fast predictive repair in erasure-coded storage, pp. 556–567 (2019)
Vajha, M., et al.: Clay codes: Moulding MDS codes to yield an MSR code. In: 16th USENIX Conference on File and Storage Technologies (FAST 2018), Oakland, CA, pp. 139–154. USENIX Association, February 2018
Wicker, S.B., Bhargava, V.K.: Reed-Solomon Codes and Their Applications. Wiley, Hoboken (1999)
Zhou, T., Tian, C.: Fast erasure coding for data storage: a comprehensive study of the acceleration techniques. ACM Trans. Storage (TOS) 16(1), 1–24 (2020)
Acknowledgment
We thank the anonymous reviewers for their insightful feedback. We also appreciate Jingwei Li, Zhirong Shen and Hu Xiong for their sincere help.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xiao, Y., Zhou, S., Zhong, L., Zhang, Z. (2021). Sector Error-Oriented Durability-Aware Fast Repair in Erasure-Coded Cloud Storage Systems. In: Tan, Y., Shi, Y., Zomaya, A., Yan, H., Cai, J. (eds) Data Mining and Big Data. DMBD 2021. Communications in Computer and Information Science, vol 1454. Springer, Singapore. https://doi.org/10.1007/978-981-16-7502-7_41
Download citation
DOI: https://doi.org/10.1007/978-981-16-7502-7_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7501-0
Online ISBN: 978-981-16-7502-7
eBook Packages: Computer ScienceComputer Science (R0)