Abstract
Erasure codes in large-scale storage systems allow recovery of data from a failed node. A recently developed class of codes, locally repairable codes (LRCs), offers tradeoffs between storage overhead and repair cost. LRCs facilitate efficient recovery scenarios by adding parity blocks to the system. However, these additional blocks may eventually increase the number of blocks that must be reconstructed. Existing LRCs differ in their use of the parity blocks, in their locality semantics, and in their parameter space. Thus, existing theoretical models cannot directly compare different LRCs to determine which code offers the best recovery performance, and at what cost.
We perform the first systematic comparison of existing LRC approaches. We analyze Xorbas, Azure’s LRCs, and Optimal-LRCs in light of two new metrics: average degraded read cost and normalized repair cost. We show the tradeoff between these costs and the code’s fault tolerance, and that different approaches offer different choices in this tradeoff. Our experimental evaluation on a Ceph cluster further demonstrates the different effects of realistic system bottlenecks on the benefit from each LRC approach. Despite these differences, the normalized repair cost metric can reliably identify the LRC approach that would achieve the lowest repair cost in each setup.
- Amazon. 2017. Amazon EBS Volumes. Retrieved September 24, 2017 from http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumes.html.Google Scholar
- Amazon. 2017. Amazon EC2 Instance Types. Retrieved September 22, 2017 from https://aws.amazon.com/ec2/instance-types.Google Scholar
- Amazon. 2017. Amazon EC2 Regions and Availability Zones. Retrieved September 22, 2017 from http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html.Google Scholar
- Apache Hadoop. 2017. HDFS Erasure Coding. Retrieved November 16, 2019 from https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html.Google Scholar
- Ceph. 2017. Jerasure Erasure Code Plugin. Retrieved September 24, 2017 from http://docs.ceph.com/docs/hammer/rados/ operations/erasure-code-jerasure/.Google Scholar
- Ceph. 2017. Locally Repairable Erasure Code Plugin. Retrieved September 24, 2017 from http://docs.ceph.com/docs/hammer/rados/operations/erasure-code-lrc/.Google Scholar
- GitHub. 2018. Optimal-LRC Matlab Source Code. Retrieved August 12, 2019 from https://github.com/olekol33/optlrc2018/tree/master/src/erasure-code/optlrc/matlab.Google Scholar
- M. Blaum, J. Brady, J. Bruck, and J. Menon. 1994. EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures. In Proceedings of the 21st Annual International Symposium on Computer Architecture (ISCA’94).Google Scholar
- Yu Lin Chen, Shuai Mu, Jinyang Li, Cheng Huang, Jin Li, Aaron Ogus, and Douglas Phillips. 2017. Giza: Erasure coding objects across global data centers. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC’17). 539--551.Google Scholar
- Alexandros G. Dimakis, P. Brighten Godfrey, Yunnan Wu, Martin J. Wainwright, and Kannan Ramchandran. 2010. Network coding for distributed storage systems. IEEE Transactions on Information Theory 56, 9 (2010), 4539--4551.Google ScholarDigital Library
- Vero Estrada-Galinanes, Ethan Miller, Pascal Felber, and Jehan-Francois Paris. 2018. Alpha entanglement codes: Practical erasure codes to archive data in unreliable environments. In Proceedings of the 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’18). IEEE, Los Alamitos, CA, 183--194.Google ScholarCross Ref
- Eyal En Gad, Robert Mateescu, Filip Blagojevic, Cyril Guyot, and Zvonimir Bandic. 2013. Repair-optimal MDS array codes over GF(2). In Proceedings of the 2013 IEEE International Symposium on Information Theory. IEEE, Los Alamitos, CA, 887--891.Google ScholarCross Ref
- Parikshit Gopalan, Cheng Huang, Huseyin Simitci, and Sergey Yekhanin. 2012. On the locality of codeword symbols. IEEE Transactions on Information Theory 58, 11 (Nov. 2012), 6925--6934.Google ScholarDigital Library
- Venkatesan Guruswami and Mary Wootters. 2016. Repairing Reed-Solomon codes. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing (STOC’16).Google ScholarDigital Library
- Cheng Huang, Minghua Chen, and Jin Li. 2013. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Transactions on Storage 9, 1 (March 2013), Article 3, 28 pages.Google ScholarDigital Library
- Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in Windows Azure storage. In Proceedings of the USENIX Annual Technical Conference (ATC’12). 15--26.Google ScholarDigital Library
- Saurabh Kadekodi, K. V. Rashmi, and Gregory R. Ganger. 2019. Cluster storage systems gotta have HeART: Improving storage efficiency by exploiting disk-reliability heterogeneity. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19). 345--358.Google Scholar
- Osama Khan, Randal C Burns, James S. Plank, William Pierce, and Cheng Huang. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 20.Google ScholarDigital Library
- Oleg Kolosov. 2018. On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes. Master’s Thesis. School of Electrical Engineering, Tel Aviv University. http://primage.tau.ac.il/libraries/theses/exeng/free/9932978264204146.pdf.Google Scholar
- Oleg Kolosov, Alexander Barg, Itzhak Tamo, and Gala Yadgar. 2018. Optimal LRC codes for all lengths n <= q. arXiv:1802.00157.Google Scholar
- Oleg Kolosov, Gala Yadgar, Matan Liram, Itzhak Tamo, and Alexander Barg. 2018. On fault tolerance, locality, and optimality in locally repairable codes. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC’18). 865--877.Google Scholar
- John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, et al. 2000. OceanStore: An architecture for global-scale persistent storage. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’00).Google ScholarDigital Library
- Jie Li and Xiaohu Tang. 2016. Optimal exact repair strategy for the parity nodes of the (k+2,k) zigzag code. IEEE Transactions on Information Theory 62, 9 (Sept. 2016), 4848--4856.Google ScholarDigital Library
- Mingqiang Li and Patrick P. C. Lee. 2014. STAIR codes: A general family of erasure codes for tolerating device and sector failures. ACM Transactions on Storage 10, 4 (Oct. 2014), Article 14, 30 pages.Google ScholarDigital Library
- Runhui Li, Xiaolu Li, Patrick P. C. Lee, and Qun Huang. 2017. Repair pipelining for erasure-coded storage. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC’17). 567--579.Google Scholar
- Xiaolu Li, Runhui Li, Patrick P. C. Lee, and Yuchong Hu. 2019. OpenEC: Toward unified and configurable erasure coding management in distributed storage systems. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19). 331--344.Google Scholar
- Jian Liu, Sihem Mesnager, and Lusheng Chen. 2018. New constructions of optimal locally recoverable codes via good polynomials. IEEE Transactions on Information Theory 64, 2 (2018), 889--899.Google ScholarDigital Library
- Subrata Mitra, Rajesh Panta, Moo-Ryong Ra, and Saurabh Bagchi. 2016. Partial-parallel-repair (PPR): A distributed technique for repairing erasure coded storage. In Proceedings of the 11th European Conference on Computer Systems. ACM, New York, NY, 30.Google ScholarDigital Library
- Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell, and Yutaka Suzue. 2012. Flat datacenter storage. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). 1--15.Google Scholar
- Frederique Oggier and Anwitaman Datta. 2011. Self-repairing homomorphic codes for distributed storage systems. In Proceedings of the 2011 IEEE INFOCOM Conference. 1215--1223.Google ScholarCross Ref
- Lluis Pamies-Juarez, Filip Blagojevic, Robert Mateescu, Cyril Guyot, Eyal En-Gad, and Zvonimir Bandic. 2016. Opening the chrysalis: On the real repair performance of MSR codes. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 81--94.Google ScholarDigital Library
- James S. Plank and Mario Blaum. 2014. Sector-disk (SD) erasure codes for mixed failure modes in RAID systems. ACM Transactions on Storage 10, 1 (2014), 4.Google ScholarDigital Library
- James S. Plank, Kevin M. Greenan, and Ethan L. Miller. 2013. Screaming Fast Galois field arithmetic using Intel SIMD instructions. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 299--306.Google Scholar
- James S. Plank, Jianqiang Luo, Catherine D. Schuman, Lihao Xu, and Zooko Wilcox-O’Hearn. 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’09), Vol. 9. 253--265.Google Scholar
- K. V. Rashmi, Preetum Nakkiran, Jingyan Wang, Nihar B. Shah, and Kannan Ramchandran. 2015. Having your cake and eating it too: Jointly optimal erasure codes for I/O, storage, and network-bandwidth. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 81--94.Google Scholar
- K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2013. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. In Proceedings of the 5th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’13).Google ScholarDigital Library
- K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2014. A “Hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. ACM SIGCOMM Computer Communication Review, 44, 4 (2014), 331--342.Google ScholarDigital Library
- K. V. Rashmi, Nihar B. Shah, and P. Vijay Kumar. 2011. Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Transactions on Information Theory 57, 8 (Aug. 2011), 5227--5239.Google ScholarDigital Library
- Irving S. Reed and Gustave Solomon. 1960. Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics 8, 2 (1960), 300--304.Google ScholarCross Ref
- Eitan Rosenfeld, Nadav Amit, and Dan Tsafrir. 2013. Using disk add-ons to withstand simultaneous disk failures with fewer replicas. In Proceedings of the 7th Annual Workshop on the Interaction Amongst Virtualization, Operating Systems, and Computer Architecture (WIVOSCA’13).Google Scholar
- Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. XORing elephants: Novel erasure codes for big data. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB’13), Vol. 6. 325--336.Google ScholarDigital Library
- Zhirong Shen, Jiwu Shu, Patrick P. C. Lee, and Yingxun Fu. 2016. Seek-efficient I/O optimization in single failure recovery for XOR-coded storage systems. IEEE Transactions on Parallel and Distributed Systems 28, 3 (2016), 877--890.Google ScholarDigital Library
- Roman Shor, Gala Yadgar, Wentao Huang, Eitan Yaakobi, and Jehoshua Bruck. 2018. How to best share a big secret. In Proceedings of the 11th ACM International Systems and Storage Conference. ACM, New York, NY, 76--88.Google ScholarDigital Library
- Mark Silberstein, Lakshmi Ganesh, Yang Wang, Lorenzo Alvisi, and Mike Dahlin. 2014. Lazy means smart: Reducing repair bandwidth costs in erasure-coded distributed storage. In Proceedings of the International Conference on Systems and Storage (SYSTOR’14). 1--7.Google ScholarDigital Library
- Itzhak Tamo and Alexander Barg. 2014. A family of optimal locally recoverable codes. IEEE Transactions on Information Theory 60, 8 (Aug. 2014), 4661--4676.Google ScholarCross Ref
- Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck. 2012. Zigzag codes: MDS array codes with optimal rebuilding. IEEE Transactions on Information Theory 59, 3 (2012), 1597--1616.Google ScholarDigital Library
- Myna Vajha, Vinayak Ramkumar, Bhagyashree Puranik, Ganesh Kini, Elita Lobo, Birenjith Sasidharan, P. Vijay Kumar, et al. 2018. Clay codes: Moulding MDS codes to yield an MSR code. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). 139--154.Google Scholar
- Feiyi Wang, Mark Nelson, Sarp Oral, Scott Atchley, Sage Weil, Bradley W. Settlemyer, Blake Caldwell, and Jason Hill. 2013. Performance and scalability evaluation of the Ceph parallel file system. In Proceedings of the 8th Parallel Data Storage Workshop. ACM, New York, NY.Google ScholarDigital Library
- Zhiying Wang, Alexandros G. Dimakis, and Jehoshua Bruck. 2010. Rebuilding for array codes in distributed storage systems. In Proceedings of the GLOBECOM Workshops (GC’10). IEEE, Los Alamitos, CA, 1905--1909.Google ScholarCross Ref
- Zhufan Wang, Guangyan Zhang, Yang Wang, Qinglin Yang, and Jiaji Zhu. 2019. Dayu: Fast and low-interference data recovery in very-large storage systems. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC’19).Google Scholar
- Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). 307--320.Google ScholarDigital Library
- Sage A. Weil, Scott A. Brandt, Ethan L. Miller, and Carlos Maltzahn. 2006. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’06). 31.Google ScholarCross Ref
- Sage A. Weil, Andrew W. Leung, Scott A. Brandt, and Carlos Maltzahn. 2007. RADOS: A scalable, reliable storage service for petabyte-scale storage clusters. In Proceedings of the 2nd International Workshop on Petascale Data Storage (PDSW’07): Held in Conjunction with Supercomputing. 35--44.Google ScholarDigital Library
- Mingyuan Xia, Mohit Saxena, Mario Blaum, and David A. Pease. 2015. A tale of two erasure codes in HDFS. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 213--226.Google ScholarDigital Library
- Xin Xie, Chentao Wu, Junqing Gu, Han Qiu, Jie Li, Minyi Guo, Xubin He, Yuanyuan Dong, and Yafei Zhao. 2019. AZ-Code: An efficient availability zone level erasure code to provide high fault tolerance in cloud storage systems. In Proceedings of the 2019 35th Symposium on Mass Storage Systems and Technologies (MSST’19).Google ScholarCross Ref
- Min Ye and Alexander Barg. 2017. Explicit constructions of high-rate MDS array codes with optimal repair bandwidth. IEEE Transactions on Information Theory 63, 4, 2001--2014.Google ScholarDigital Library
- Alexander Zeh and Eitan Yaakobi. 2016. Bounds and constructions of codes with multiple localities. arXiv:1601.02763.Google Scholar
- Guangyan Zhang, Zican Huang, Xiaosong Ma, Songlin Yang, Zhufan Wang, and Weimin Zheng. 2018. RAID+: Deterministic and balanced data distribution for large disk enclosures. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). 279--294.Google Scholar
- Tianli Zhou and Chao Tian. 2019. Fast erasure coding for data storage: A comprehensive study of the acceleration techniques. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19). 317--329.Google Scholar
Index Terms
- On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes
Recommendations
On fault tolerance, locality, and optimality in locally repairable codes
USENIX ATC '18: Proceedings of the 2018 USENIX Conference on Usenix Annual Technical ConferenceErasure codes are used in large-scale storage systems to allow recovery of data from a failed node. A recently developed class of erasure codes, termed locally repairable codes (LRCs), offers tradeoffs between storage overhead and repair cost. LRCs ...
Self-repairing codes
Networked distributed data storage systems are essential to deal with the needs of storing massive volumes of data. Dependability of such a system relies on its fault tolerance (data should be available in case of node failures) as well as its ...
A study of the performance of novel storage-centric repairable codes
Erasure coding has become an integral part of the storage infrastructure in data-centers and cloud backends--since it provides significantly higher fault tolerance for substantially lower storage overhead compared to a naive approach like n-way ...
Comments