Geo-aware erasure coding for high-performance erasure-coded storage clusters

Mohan, Lakshmi J.; Caneleo, Pablo Ignacio Serrano; Parampalli, Udaya; Harwood, Aaron

doi:10.1007/s12243-017-0623-2

Geo-aware erasure coding for high-performance erasure-coded storage clusters

Published: 18 January 2018

Volume 73, pages 139–152, (2018)
Cite this article

Annals of Telecommunications Aims and scope Submit manuscript

465 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

Erasure code-based distributed storage systems are increasingly being used by storage providers for big data storage since they offer the same reliability as replication with a significant decrease in the amount of storage required. But, when it comes to a storage system with data nodes spread across a very large geographical area, the node’s recovery performance is affected by various factors that are both network and computation related. In this paper, we present a XOR-based code supplemented with the ideas of parity duplication and rack awareness that could be adopted in such storage clusters to improve the recovery performance during node failures and compare it with popular implementations of erasure codes, namely Facebook’s Reed-Solomon codes and XORBAS local recovery codes. The code performance along with the proposed ideas are evaluated on a geo-diverse cluster deployed on the NeCTAR research cloud. We also present a scheme for intelligently placing blocks of coded storage depending on the design of the code, inspired by local reconstruction codes. The sum of all these propositions could offer a better solution for applications that are deployed on coded storage systems that are geographically distributed, in which storage constraints make triple replication not affordable, at the same time ensuring minimal recovery time is a strict requirement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Corbett JC, Dean J et al (2012) Spanner: Google’s globally-distributed database. In: Proceedings of the 10th USENIX conference on operating systems design and implementation. OSDI’12. USENIX Association, Berkeley, pp 251–264
Evangelinos C, Hill CN (2008) Cloud computing for parallel scientific HPC applications: feasibility of running coupled atmosphere-ocean climate models on amazon’s EC2. Cloud computing and its applications
Li J, Humphrey M, Agarwal DA, Jackson KR, van Ingen C, Ryu Y. Escience in the cloud: a modis satellite data reprojection and reduction pipeline in the windows azure platform. In: IPDPS. IEEE, pp 1–10
Rashmi KV, Shah NB, Gu D, Kuang H, Borthakur D, Ramchandran K (2014) A “hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. SIGCOMM Comput Commun Rev 44 (4):331–342
Article Google Scholar
Sathiamoorthy M, Asteris M, Papailiopoulos D, Dimakis AG, Vadali R, Chen S, Borthakur D (2013) Xoring elephants: novel erasure codes for big data. In: Proceedings of the VLDB endowment
Xia M, Saxena M, Blaum M, Pease DA (2015) A tale of two erasure codes in hdfs. In: Proceedings of the 13th USENIX conference on file and storage technologies. FAST’15. USENIX Association, Berkeley, pp 213–226
Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S (2012) Erasure coding in windows azure storage. In: Proceedings of the 2012 USENIX conference on annual technical conference. USENIX ATC’12. USENIX Association, Berkeley, pp 2–2
HDFS-RAID. http://wiki.apache.org/hadoop/HDFS-RAID. [Online; Accessed 10-5-2017]
Li R, Lin J, Lee PPC (2013) CORE: augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems. CoRR, 1302.3344
Esmaili KS, Pamies-Juarez L, Datta A (2013) The CORE storage primitive: cross-object redundancy for efficient data repair & access in erasure coded storage. CoRR, 1302.5192
Mohan LJ, Harold RL, Serrano Caneleo PI, Parampalli U, Harwood A (2015) Benchmarking the performance of hadoop triple replication and erasure coding on a nation-wide distributed cloud. In: 2015 International symposium on network coding, NetCod 2015, Sydney, Australia, June 22–24, pp 61–65
NeCTAR. https://www.nectar.org.au. [Online; Accessed 10-5-2017]
Facebook Hadoop-20. https://github.com/facebookarchive/hadoop-20. [Online; Accessed 10-5-2017]
Hadoop-USC. https://github.com/madiator/HadoopUSC. [Online; accessed 10-5-2017]
Blaum M, Brady J, Bruck J, Menon J (1995) Evenodd: an efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans Comput 44(2):192–202
Article MATH Google Scholar
Cadambe VR, Jafar SA, Maleki H, Ramchandran K, Suh C (2013) Asymptotic interference alignment for optimal repair of mds codes in distributed storage. IEEE Trans Inf Theory 59(5):2974–2987
Article MathSciNet MATH Google Scholar
Wang Z, Dimakis AG, Bruck J (2010) Rebuilding for array codes in distributed storage systems. In: 2010 IEEE Globecom Workshops, pp 1905–1909
Xiang L, Xu Y, Lui JCS, Chang Q (2010) Optimal recovery of single disk failure in rdp code storage systems. In: Proceedings of the ACM SIGMETRICS international conference on measurement and modeling of computer systems. SIGMETRICS ’10. ACM, New York, pp 119–130
Huang C, Chen M, Li J (2013) Pyramid codes: flexible schemes to trade space for access efficiency in reliable data storage systems. Trans Storage 9(1):3:1–3:28
Article Google Scholar
Gopalan P, Huang C, Simitci H, Yekhanin S (2012) On the locality of codeword symbols. IEEE Trans Inf Theory 58(11):6925–6934
Article MathSciNet MATH Google Scholar
Kamath GM, Prakash N, Lalitha V, Kumar PV (2014) Codes with local regeneration and erasure correction. IEEE Trans Inf Theory 60(8):4637–4660
Article MathSciNet MATH Google Scholar
Pamies-Juarez L, Hollmann HDL, Oggier F (2013) Locally repairable codes with multiple repair alternatives. In: 2013 IEEE international symposium on information theory, pp 892–896
Silberstein N, Rawat AS, Vishwanath S (2015) Error-correcting regenerating and locally repairable codes via rank-metric codes. IEEE Trans Inf Theory 61(11):5765–5778
Article MathSciNet MATH Google Scholar
Tamo I, Barg A (2014) A family of optimal locally recoverable codes. IEEE Trans Inf Theory 60(8):4661–4676
Article MathSciNet MATH Google Scholar
Gong Q, Wang J, Wei D, Wang J, Wang X (2015) Optimal node selection for data regeneration in heterogeneous distributed storage systems. In: 2015 44th international conference on parallel processing, pp 390–399
Muralidhar S, Lloyd W, Roy S, Hill C, Lin E, Liu W, Pan S, Shankar S, Sivakumar V, Tang L, Kumar S (2014) f4: Facebook’s warm blob storage system. In: 11th USENIX symposium on operating systems design and implementation (OSDI 14). USENIX Association, Broomfield, pp 383–398
Reed IS, Solomon G (1960) Polynomial codes over certain finite fields. J Soc Ind Appl Math 8(2):300–304
Article MathSciNet MATH Google Scholar
Jerasure. http://lab.jerasure.org/jerasure/jerasure. [Online; accessed 10-5-2017]
Ghemawat S, Gobioff H, Leung S-T (2003) The google file system. In: Proceedings of the nineteenth ACM symposium on operating systems principles. SOSP ’03. ACM, New York, pp 29–43
Data Replication- HDFS Architecture guide. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Data+Replication. [Online; Accessed 10-5-2017]
Rashmi KV, Shah NB, Gu D, Kuang H, Borthakur D, Ramchandran K (2013) A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the facebook warehouse cluster. In: Proceedings of the 5th USENIX conference on hot topics in storage and file systems. HotStorage’13. USENIX Association, Berkeley, pp 8–8
Krishnan MN, Prakash N, Lalitha V, Sasidharan B, Kumar PV, Narayanamurthy S, Kumar R, Nandi S (2014) Evaluation of codes with inherent double replication for hadoop. In: 6th USENIX workshop on hot topics in storage and file systems (HotStorage 14). USENIX Association, Philadelphia
Li J, Yang S, Wang X, Li B (2010) Tree-structured data regeneration in distributed storage systems with regenerating codes. In: INFOCOM, 2010 Proceedings IEEE, pp 1–9
Dimakis AG, Ramchandran K, Wu Y, Suh C (2011) A survey on network codes for distributed storage. Proc IEEE 99(3):476–489
Article Google Scholar
Rack awareness. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html. [Online; accessed 10-5-2017]

Download references

Acknowledgements

This research was supported by use of the Nectar Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy (NCRIS). This work was supported by Data61/CSIRO and ARC discovery project DP150104473.

Author information

Authors and Affiliations

Department of Computing and Information Systems, University of Melbourne, Parkville, Australia
Lakshmi J. Mohan, Pablo Ignacio Serrano Caneleo, Udaya Parampalli & Aaron Harwood

Authors

Lakshmi J. Mohan
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Ignacio Serrano Caneleo
View author publications
You can also search for this author in PubMed Google Scholar
Udaya Parampalli
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Harwood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lakshmi J. Mohan.

Appendix

1.1 Location Awareness

This section briefly introduces the notion of topology awareness with a custom network layout. The Hadoop software framework provides a ready-to-use and flexible implementation of custom network topologies based on bash scripts. It basically uses the rack awareness idea in a data center to implement topology awareness in a cluster. In order to configure any custom geographic topology, some lines should be added to the Hadoop configuration file $core-site.xml$. The following lines explain the complete process:

The bash script used in our experiments is based on the standard Hadoop topology awareness code, provided at Hadoop Wiki [35]. We modified the code, based on other community references, generating the following code for our geo-diverse cluster:

The above script reads a topology information file “rack_toplology.data” that specifies racks (in our case, locations) and machines in a key-pair relationship using a simple format. Given that our clusters were distributed around Australia, it was natural to organize the different racks using the different locations available on the NeCTAR cloud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohan, L.J., Caneleo, P.I.S., Parampalli, U. et al. Geo-aware erasure coding for high-performance erasure-coded storage clusters. Ann. Telecommun. 73, 139–152 (2018). https://doi.org/10.1007/s12243-017-0623-2

Download citation

Received: 15 September 2016
Accepted: 18 December 2017
Published: 18 January 2018
Issue Date: February 2018
DOI: https://doi.org/10.1007/s12243-017-0623-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geo-aware erasure coding for high-performance erasure-coded storage clusters

Abstract

Access this article

Subscribe and save

Buy Now

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Location Awareness

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation