Abstract
Practical storage systems often adopt erasure codes to tolerate device failures and sector failures, both of which are prevalent in the field. However, traditional erasure codes employ device-level redundancy to protect against sector failures, and hence incur significant space overhead. Recent sector-disk (SD) codes are available only for limited configurations. By making a relaxed but practical assumption, we construct a general family of erasure codes called STAIR codes, which efficiently and provably tolerate both device and sector failures without any restriction on the size of a storage array and the numbers of tolerable device failures and sector failures. We propose the upstairs encoding and downstairs encoding methods, which provide complementary performance advantages for different configurations. We conduct extensive experiments on STAIR codes in terms of space saving, encoding/decoding speed, and update cost. We demonstrate that STAIR codes not only improve space efficiency over traditional erasure codes, but also provide better computational efficiency than SD codes based on our special code construction. Finally, we present analytical models that characterize the reliability of STAIR codes, and show that the support of a wider range of configurations by STAIR codes is critical for tolerating sector failure bursts discovered in the field.
- Bairavasundaram, L. N., Goodson, G. R., Pasupathy, S., and Schindler, J. 2007. An analysis of latent sector errors in disk drives. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’07), 289--300. Google ScholarDigital Library
- Blaum, M. 2006. A family of MDS array codes with minimal number of encoding operations. In Proceedings of the IEEE International Symposium on Information Theory (ISIT’06), 2784--2788.Google ScholarCross Ref
- Blaum, M., Brady, J., Bruck, J., and Menon, J. 1995. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput. 44, 2, 192--202. Google ScholarDigital Library
- Blaum, M., Bruck, J., and Vardy, A. 1996. MDS array codes with independent parity symbols. IEEE Trans. Inf. Theory 42, 2, 529--542. Google ScholarDigital Library
- Blaum, M., Hafner, J. L., and Hetzler, S. 2013. Partial-MDS codes and their application to RAID type of architectures. IEEE Trans. Inf. Theory 59, 7, 4510--4519. Google ScholarDigital Library
- Blaum, M., Hafner, J. L., and Hetzler, S. R. 2012. Nested multiple erasure correcting codes for storage arrays. U.S. Patent No. 13/036,845, Filed February 28, 2011, Issued August 30, 2012.Google Scholar
- Blaum, M. and Plank, J. S. 2013. Construction of sector-disk (SD) codes with two global parity symbols. IBM Res. Rep. RJ10511 (ALM1308-007), Almaden Research Center, IBM Research Division.Google Scholar
- Blomer, J., Kalfane, M., Karp, R., Karpinski, M., Luby, M., and Zuckerman, D. 1995. An XOR-based erasure-resilient coding scheme. Tech. Rep. TR-95-048, International Computer Science Institute, University of California, Berkeley.Google Scholar
- Boboila, S. and Desnoyers, P. 2010. Write endurance in flash drives: Measurements and analysis. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10), 115--128. Google ScholarDigital Library
- Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., and Sankar, S. 2004. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04), 1--14. Google ScholarDigital Library
- Dholakia, A., Eleftheriou, E., Hu, X.-Y., Iliadis, I., Menon, J., and Rao, K. 2008. A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors. ACM Trans. Storage 4, 1, 1--42. Google ScholarDigital Library
- Dholakia, A., Eleftheriou, E., Hu, X.-Y., Iliadis, I., Menon, J., and Rao, K. 2011. Disk scrubbing versus intradisk redundancy for RAID storage systems. ACM Trans. Storage 7, 2, 1--42. Google ScholarDigital Library
- Elias, P. 1954. Error-free coding. IRE Trans. Inf. Theory 4, 4, 29--37.Google ScholarCross Ref
- Feng, G., Deng, R., Bao, F., and Shen, J. 2005a. New efficient MDS array codes for RAID Part I: Reed-Solomon-like codes for tolerating three disk failures. IEEE Trans. Comput. 54, 9, 1071--1080. Google ScholarDigital Library
- Feng, G., Deng, R., Bao, F., and Shen, J. 2005b. New efficient MDS array codes for RAID Part II: Rabin-like codes for tolerating multiple (≥4) disk failures. IEEE Trans. Comput. 54, 12, 1473--1483. Google ScholarDigital Library
- Greenan, K. M., Plank, J. S., and Wylie, J. J. 2010. Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In Proceedings of the 2nd Workshop on Hot Topics in Storage and File Systems (HotStorage’10), 1--5. Google ScholarDigital Library
- Grupp, L. M., Caulfield, A. M., Coburn, J., Swanson, S., Yaakobi, E., Siegel, P. H., and Wolf, J. K. 2009. Characterizing flash memory: Anomalies, observations, and applications. In Proceedings of the 42nd International Symposium on Microarchitecture (MICRO’09), 24--33. Google ScholarDigital Library
- Grupp, L. M., Davis, J. D., and Swanson, S. 2012. The bleak future of NAND flash memory. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12), 17--24. Google ScholarDigital Library
- Hafner, J. L. 2005. WEAVER codes: Highly fault tolerant erasure codes for storage systems. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST’05), 211--224. Google ScholarDigital Library
- Hafner, J. L. 2006. HoVer erasure codes for disk arrays. In Proceedings of the International Conference on Dependable Systems and Networks (DSN’06), 1--10. Google ScholarDigital Library
- Huang, C., Chen, M., and Li, J. 2013. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Trans. Storage 9, 1, 1--28. Google ScholarDigital Library
- Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., and Yekhanin, S. 2012. Erasure coding in Windows Azure storage. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’12), 15--26. Google ScholarDigital Library
- Huang, C. and Xu, L. 2005. STAR: An efficient coding scheme for correcting triple storage node failures. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST’05), 889--901. Google ScholarDigital Library
- Iliadis, I. and Hu, X.-Y. 2008. Reliability assurance of RAID storage systems for a wide range of latent sector errors. In Proceedings of the IEEE International Conference on Networking, Architecture, and Storage (NAS’08), 10--19. Google ScholarDigital Library
- Intel. 2005. Intelligent RAID 6 theory --- overview and implementation. White Paper. Intel Corporation.Google Scholar
- Li, M. and Lee, P. P. C. 2014. STAIR codes: A general family of erasure codes for tolerating device and sector failures in practical storage systems. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14), 147--162. Google ScholarDigital Library
- Li, M. and Shu, J. 2011. C-Codes: Cyclic lowest-density MDS array codes constructed using starters for RAID 6. IBM Res. Rep. RC25218 (C1110-004), China Research Laboratory, IBM Research Division.Google Scholar
- Li, M., Shu, J., and Zheng, W. 2009. GRID codes: Strip-based erasure codes with high fault tolerance for storage systems. ACM Trans. Storage 4, 4, 1--22. Google ScholarDigital Library
- Oprea, A. and Juels, A. 2010. A clean-slate look at disk scrubbing. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10), 1--14. Google ScholarDigital Library
- Pinheiro, E., Weber, W.-D., and Barroso, L. A. 2007. Failure trends in a large disk drive population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST’07), 17--28. Google ScholarDigital Library
- Plank, J. S. 1997. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Softw. Pract. Exp. 27, 9, 995--1012. Google ScholarDigital Library
- Plank, J. S. and Blaum, M. 2014. Sector-disk (SD) erasure codes for mixed failure modes in RAID systems. ACM Trans. Storage 10, 1, 1--17. Google ScholarDigital Library
- Plank, J. S., Blaum, M., and Hafner, J. L. 2013a. SD codes: Erasure codes designed for how storage systems really fail. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13), 95--104. Google ScholarDigital Library
- Plank, J. S., Buchsbaum, A. L., and Vander Zanden, B. T. 2011. Minimum density RAID-6 codes. ACM Trans. Storage 6, 4, 1--22. Google ScholarDigital Library
- Plank, J. S. and Ding, Y. 2005. Note: Correction to the 1997 tutorial on Reed-Solomon coding. Softw. Pract. Exp. 35, 2, 189--194. Google ScholarDigital Library
- Plank, J. S., Greenan, K. M., and Miller, E. L. 2013b. Screaming fast Galois Field arithmetic using Intel SIMD instructions. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13), 299--306. Google ScholarDigital Library
- Plank, J. S. and Huang, C. 2013. Tutorial: Erasure coding for storage applications. Slides presented at the 11th USENIX Conference on File and Storage Technologies.Google Scholar
- Plank, J. S. and Xu, L. 2006. Optimizing Cauchy Reed-Solomon codes for fault-tolerant network storage applications. In Proceedings of the 5th IEEE International Symposium on Network Computing and Applications (NCA’06), 173--180. Google ScholarDigital Library
- Reed, I. S. and Solomon, G. 1960. Polynomial codes over certain finite fields. J. Soc. Indust. Appl. Math. 8, 2, 300--304.Google ScholarCross Ref
- Sathiamoorthy, M., Asteris, M., Papailiopoulous, D., Dimakis, A. G., Vadali, R., Chen, S., and Borthakur, D. 2013. XORing elephants: Novel erasure codes for big data. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB’13), 325--336. Google ScholarDigital Library
- Schroeder, B., Damouras, S., and Gill, P. 2010. Understanding latent sector errors and how to protect against them. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10), 71--84. Google ScholarDigital Library
- Schroeder, B. and Gibson, G. A. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST’07), 1--16. Google ScholarDigital Library
- Schwarz, T. J. E., Xin, Q., Miller, E. L., and Long, D. D. E. 2004. Disk scrubbing in large archival storage systems. In Proceedings of the 12th Annual Meeting of the IEEE/ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’04), 409--418. Google ScholarDigital Library
- White, J. and Lueth, C. 2010. RAID-DP: NetApp implementation of double-parity RAID for data protection. Tech. Rep. TR-3298, NetApp, Inc.Google Scholar
- Wildani, A., Schwarz, T. J. E., Miller, E. L., and Long, D. D. 2009. Protecting against rare event failures in archival systems. In Proceedings of the 17th Annual Meeting of the IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’09), 1--11.Google Scholar
- Xu, L., Bohossian, V., Bruck, J., and Wagner, D. G. 1999. Low-density MDS codes and factors of complete graphs. IEEE Trans. Inf. Theory 45, 6, 1817--1826. Google ScholarDigital Library
- Xu, L. and Bruck, J. 1999. X-Code: MDS array codes with optimal encoding. IEEE Trans. Inf. Theory 45, 1, 272--276. Google ScholarDigital Library
- Zheng, M., Tucek, J., Qin, F., and Lillibridge, M. 2013. Understanding the robustness of SSDs under power fault. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13), 271--284. Google ScholarDigital Library
Index Terms
- STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures
Recommendations
Sector-Disk (SD) Erasure Codes for Mixed Failure Modes in RAID Systems
Traditionally, when storage systems employ erasure codes, they are designed to tolerate the failures of entire disks. However, the most common types of failures are latent sector failures, which only affect individual disk sectors, and block failures ...
STAIR codes: a general family of erasure codes for tolerating device and sector failures in practical storage systems
FAST'14: Proceedings of the 12th USENIX conference on File and Storage TechnologiesPractical storage systems often adopt erasure codes to tolerate device failures and sector failures, both of which are prevalent in the field. However, traditional erasure codes employ device-level redundancy to protect against sector failures, and ...
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems
We design flexible schemes to explore the tradeoffs between storage space and access efficiency in reliable data storage systems. Aiming at this goal, two new classes of erasure-resilient codes are introduced -- Basic Pyramid Codes (BPC) and Generalized ...
Comments