Abstract
Existing triple-failure-tolerant codes assume that failures are independent and instantaneous. Such assumptions overlook the underlying mechanism of multi-failure occurrences and ignored the effect of reconstruction window. These codes are not adapted to the occurrence pattern of failure in real-world applications. As a result, the third parity drive is almost idle as it set to handle the triple-failure scenario only with lower-level failure situations unattended. Furthermore, the problem of single failure rebuild deteriorates with the increasing disk capacity, and the system’s reliability will decrease with user experience impaired. Aiming at these problems, a fast reconstructable coding scheme extended from RAID-6 has been developed in this study. RAID-6Plus maintains a smaller reconstruction window by recoding the third parity drive. Existing codes provide absolute reliability for triple failures via full combinations. As a contrast, RAID-6Plus employs short combinations which are able to greatly reuse overlapped elements during reconstruction to remake the third parity drive. The short combinations shorten the reconstruction window of single failure, which avoids multi-failure overlapping in the reconstruction window. The capability of multi-failure degradation provides RAID-6Plus with (1) a better system performance comparing to RTP and STAR and (2) an enhanced reliability comparing to RAID-6.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57, 889–901 (2008)
Goel, A., Corbett, P.: RAID triple parity. ACM SIGOPS Oper. Syst. Rev. 46, 41–49 (2012)
Blaum, M., Bruck, J., Vardy, A.: MDS array codes with independent parity symbols. IEEE Trans. Inf. Theor. 42, 529–542 (1996)
Jain, N., Dahlin, M., Tewari, R.: TAPER: tiered approach for eliminating redundancy in replica synchronization. In: FAST, pp. 21–21
Chen, P.M., Lee, E.K., Gibson, G.A., Katz, R.H., Patterson, D.A.: RAID: high-performance, reliable secondary storage. ACM Comput. Surv. (CSUR) 26, 145–185 (1994)
Amer, A., Long, D.D., Thomas Schwarz, S.: Reliability challenges for storing exabytes. In: 2014 International Conference on Computing, Networking and Communications (ICNC), pp. 907–913. IEEE (2014)
Schroeder, B., Gibson, G.A.: Disk failures in the real world: what does an MTTF of 1, 000, 000 hours mean to you? In: FAST, pp. 1–16
Plank, J.S., Blaum, M.: Sector-disk (SD) erasure codes for mixed failure modes in RAID systems. ACM Trans. Storage (TOS) 10, 4 (2014)
Leventhal, A.: Triple-parity RAID and beyond. Queue 7, 30 (2009)
Xiang, L., Xu, Y., Lui, J., Chang, Q.: Optimal recovery of single disk failure in RDP code storage systems. ACM SIGMETRICS Perform. Eval. Rev. 38, 119–130 (2010)
Xiang, L., Xu, Y., Lui, J., Chang, Q., Pan, Y., Li, R.: A hybrid approach to failed disk recovery using RAID-6 codes: algorithms and performance evaluation. ACM Trans. Storage (TOS) 7, 11 (2011)
Zhu, Y., Lee, P.P., Xiang, L., Xu, Y., Gao, L.: A cost-based heterogeneous recovery scheme for distributed storage systems with RAID-6 codes, pp. 1–12. IEEE
Khan, O., Burns, R.C., Plank, J.S., Pierce, W., Huang, C.: Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads, p. 20
Ma, A., Douglis, F., Lu, G., Sawyer, D., Chandra, S., Hsu, W.: RAIDShield: characterizing, monitoring, and proactively protecting against disk failures. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies, pp. 241–256. USENIX Association (2015)
Mingyuan, X., Mohit, S., Mario, B., David, A.P.: A tale of two erasure codes in HDFS. In: FAST, pp. 213–226 (2015)
Pinheiro, E., Weber, W.-D., Barroso, L.A.: Failure trends in a large disk drive population. In: FAST, pp. 17–23
Luo, X., Shu, J.: Load-balanced recovery schemes for single-disk failure in storage systems with any erasure code. In: 2013 42nd International Conference on Parallel Processing (ICPP), pp. 552–561. IEEE (2013)
Boboila, S., Desnoyers, P.: Write endurance in flash drives: measurements and analysis, pp. 9–9
Elerath, J.G., Schindler, J.: Beyond MTTDL: a closed-form RAID 6 reliability equation. ACM Trans. Storage (TOS) 10, 7 (2014)
Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., Sankar, S.: Row-diagonal parity for double disk failure correction. In: Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pp. 1–14
Rongdong, H., Guangming, L., Jingfei, J.: An efficient coding scheme for tolerating double disk failures. In: 2010 12th IEEE International Conference on High Performance Computing and Communications (HPCC), pp. 707–712 (2010)
Acknowledgment
We are grateful to our anonymous reviewers for their suggestions to improve this paper. This work is supported by the National Natural Science Foundation of China under Grant Nos. 61232003, 61332003, 61202121, 61402503, 61303073.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Deng, MZ. et al. (2015). RAID-6Plus: A Fast and Reliable Coding Scheme Aided by Multi-failure Degradation. In: Yao, L., Xie, X., Zhang, Q., Yang, L., Zomaya, A., Jin, H. (eds) Advances in Services Computing. APSCC 2015. Lecture Notes in Computer Science(), vol 9464. Springer, Cham. https://doi.org/10.1007/978-3-319-26979-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-26979-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26978-8
Online ISBN: 978-3-319-26979-5
eBook Packages: Computer ScienceComputer Science (R0)