Skip to main content
Log in

Thou code: a triple-erasure-correcting horizontal code with optimal update complexity

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Taking advantage of consuming low storage resource, erasure codes have been widely adopted by various storage systems, e.g., HDFS, Ceph, and Microsoft Azure, to provide reliability. With the rapid expansion of data storage, it becomes increasingly significant to protect against co-occurring failure events. RAID-6 codes are capable of addressing the issue of multi-disk faults to some extent, while they are prone to catastrophic data loss if two disks are held on the same shelf and the shelf encounters some unrecoverable failures. Therefore, triple disk failure tolerant (3DFT) coding, which increases tolerance for concurrent faults, is increasingly popular. Existing 3DFT codes either have low computational efficiency, e.g., Reed–Solomon (RS) codes and minimum storage regeneration (MSR) codes, or high write overhead, e.g., Cauchy RS (CRS) codes, STAR codes, and RTP codes. In this paper, we propose a kind of lowest-density XOR-based 3DFT codes, named Thou codes, which have lowest storage overhead, high encoding and decoding performance, and optimal write overhead. Thou codes are constructed by an efficient searching approach over a cluster of matrices derived by Liberation codes, a sort of lowest-density RAID-6 codes. Experimental results demonstrate that Thou codes have high encoding and decoding efficiency and outperform three state-of-the-art 3DFT codes in write performance. Compared to STAR codes and RTP codes, Thou codes can reduce write overhead averagely by 21.9% and 26.8%, respectively, under real I/O workloads experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Write performance means the average number of parity elements to be modified when a data element or multiple consecutive data elements are updated, which is inversely proportional to write overhead or write complexity.

  2. Disk and node refer the same in this paper.

  3. A permutation matrix refers to a matrix with exactly one one in each row and column.

References

  1. Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. In: Proceedings SOSP, Bolton Landing (Lake George), NY, USA, pp 29–43

  2. Weil SA, Brandt SA, Miller EL, Long DD, Maltzahn C (2006) Ceph: a scalable, high performance distributed file system. In: Proceedings OSDI, Seattle, WA, pp 307–320

  3. Arnold BJ (2014) OpenStack swift using, administering, and developing for swift object storage. O’Reilly Media Press, Sebastopol, pp 125–150

    Google Scholar 

  4. Atul G, Peter C (2012) RAID triple parity. ACM SIGOPS Oper Syst Rev 46(3):41–49

    Article  Google Scholar 

  5. Mitra S, Panta R, Ra M-R, Bagchi S (2016) Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage. In: Proceedings EuroSys, Imperial College London, London

  6. Reed I, Solomon G (1960) Polynomial codes over certain finite fields. J Soc Ind Appl Math 8(2):300–304

    Article  MathSciNet  Google Scholar 

  7. Chamazcoti SA, Miremadi SG (2014) EA-EO: endurance aware erasure code for SSD-based storage systems. In: Proceedings PRDC, Singapore, Singapore, pp 76–85

  8. Chamazcoti SA, Delavari Z, Miremadi SG, Asadi H (2015) On endurance and performance of erasure codes in SSD-based storage systems. Microelectron Reliab 55(11):2453–2467

    Article  Google Scholar 

  9. Chamazcoti SA, Miremadi SG (2016) On designing endurance aware erasure code for SSD-based storage systems. Microprocess Microsyst 45:283–296

    Article  Google Scholar 

  10. Bloemer J, Kalfane M, Karp R (1995) An XOR-based erasure-resilient coding scheme. Technical Report TR-95-048, ICSI, Berkeley, CA

  11. Huang C, Xu L (2008) STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans Comput 57(7):889–901

    Article  MathSciNet  Google Scholar 

  12. Zhang Y, Wu C, Li J, Guo M (2015) TIP-Code: a three independent parity code to tolerate triple disk failures with optimal update complextiy. In: Proceedings DSN, Univ Estadual Campinas, Rio de Janeiro, Brazil, pp 136–147

  13. Blaum M, Brady J, Bruck J, Menon J (1995) EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans Comput 44(2):192–202

    Article  Google Scholar 

  14. Corbett P, English B, Goel A, Grcanac T, Kleiman S, Leong J, Sankar S (2004) Row-diagonal parity for double disk failure correction. In: Proceedings FAST, San Francisco, CA, pp 1–14

  15. Blaum M, Roth R (1999) On lowest density MDS codes. IEEE Trans Inf Theory 45(1):46–59

    Article  MathSciNet  Google Scholar 

  16. Plank JS (2008) The RAID-6 liberation codes. In: Proceedings FAST, San Jose, CA, pp 97–110

  17. Plank JS (2009) The Raid-6 Liber8Tion code. Int J High Perform Comput Appl 23(3):242–251

    Article  Google Scholar 

  18. Xu L, Bruck J (1999) X-code: MDS array codes with optimal encoding. IEEE Trans Inf Theory 45(1):272–276

    Article  MathSciNet  Google Scholar 

  19. Xu L, Bohossian V, Bruck J, Wagner D (1999) Low-density MDS codes and factors of complete graphs. IEEE Trans Inf Theory 45(6):1817–1826

    Article  MathSciNet  Google Scholar 

  20. Wu C, Wan S, He X, Xu B, Cao Q, Xie C (2011) H-Code: A hybrid MDS array code to optimize partial stripe writes in RAID-6. In: Proceedings IPDPS, Anchorage, AK, USA, pp 782–793

  21. Jin C, Jiang H, Feng D, Tian L (2009) P-code: A new RAID-6 code with optimal properties. In: Proceedings ICS, Yorktown Heights, NY, pp 360–369

  22. Shen Z, Shu J (2014) HV code: an all-around MDS code to improve efficiency and reliability of RAID-6 systems. In: Proceedings DSN, Atlanta, GA, pp 550–561

  23. Ford D, Labelle F, Popovici FI, Murray S, Sean Q (2010) Availability in globally distributed file systems. In: Proceedings OSDI, Vancouver, BC, Canada, pp 61–74

  24. Hady FT, Foong A, Veal B, Williams D (2017) Platform storage performance with 3D XPoint technology. Proc IEEE 105(9):1822–1833

    Article  Google Scholar 

  25. Jeremy C, Edmund BN, Christopher F, Engin I, Benjamin L, Doug B, Derrick C (2009) Better I/O through byte-addressable, persistent memory. In: Proceedings SOSP, Big Sky, MT, pp 133–146

  26. Dong M, Chen H (2017) Soft updates made simple and fast on non-volatile memory. In: Proceedings ATC, Santa Clara, CA, pp 719–731

  27. Yang J, Joseph I, Steven S (2019) Orion: a distributed file system for non-volatile main memory and RDMA-capable networks. In: Proceedings FAST, Boston, MA, pp 221–234

  28. Xu J, Steven S (2016) NOVA: a log-structured file system for hybrid volatile/non-volatile main memories. In: Proceedings FAST, Santa Clara, CA, pp 323–338

  29. Xu J, Zhang L, Amirsaman M, Akshatha G, Amit B, Tamires BDS, Steven S, Andy R (2017) NOVA-fortis: a fault-tolerant non-volatile main memory file system. In: Proceedings SOSP, Shanghai, People’s Republic of China, pp 478–496

  30. Goparaju S, Fazeli A, Vardy A (2017) Minimum storage regenerating codes for all parameters. IEEE Trans Inf Theory 63(10):6318–6328

    Article  MathSciNet  Google Scholar 

  31. Plank JS, Buchsbaum AL (2007) Some classes of invertible matrices in GF(2). Technical Report CS-07-599, University of Tennessee

  32. Plank JS, Jerasure G (2014) A library in C facilitating erasure coding for storage applications–version 2.0. Technical Report CS-14-721, University of Tennessee

  33. Narayanan D, Donnelly A, Donnelly A (2008) Write off-loading: practical power management for enterprise storage. In: Proceedings FAST, San Jose, CA, pp 253–267

  34. N-29–17: NAND flash design and use considerations introduction Micron. http://download.micron.com/pdf/technotes/nand/tn2917.pdf

  35. Blaum M, Hafner JL, Hetzler S (2013) Partial-MDS codes and their application to RAID type of architectures. IEEE Trans Inf Theory 59(7):4510–4519

    Article  MathSciNet  Google Scholar 

  36. Plank JS, Blaum M, Hafner JL, Codes SD (2013) Erasure codes designed for how storage systems really fail. In: Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST ’13), pp 95–104, San Jose, CA

  37. Li Mingqiang, Lee Patrick PC (2014) STAIR codes: a general family of erasure codes for tolerating device and sector failures. ACM Trans Storage (TOS) 10(4):1–30

    Article  Google Scholar 

  38. Kishani Mostafa, Ahmadian Saba, Asadi Hossein (2019) A modeling framework for reliability of erasure codes in SSD arrays. IEEE Trans Comput 69(5):649–665

    Article  Google Scholar 

Download references

Acknowledgements

This research has been supported by the China National Key R&D Program during the 13th Five-year Plan Period (Grant No. 2018YFB1700405).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ningjing Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

In Table 4, we list Thou codes when \(w \le 43\). The specification of the \(X_i\) and \(Z_i\) follows the convention defined in Sect. 4.3, where \({X_i} = ({\varPi _0},{\varPi _1}, \ldots ,{\varPi _{k - 1}})\) and \({Z_i} =( {\varGamma _0},{\varGamma _1}, \ldots ,{\varGamma _{k - 1})}\).

Table 3 Percentage of performance improvement for Thou codes in comparison with other codes
Table 4 Thou codes when \(w \le 43\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, N., Zhang, X., Chen, H. et al. Thou code: a triple-erasure-correcting horizontal code with optimal update complexity. J Supercomput 78, 10088–10117 (2022). https://doi.org/10.1007/s11227-021-04271-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04271-9

Keywords

Navigation