Abstract
Erasure code-based distributed storage systems are increasingly being used by storage providers for big data storage since they offer the same reliability as replication with a significant decrease in the amount of storage required. But, when it comes to a storage system with data nodes spread across a very large geographical area, the node’s recovery performance is affected by various factors that are both network and computation related. In this paper, we present a XOR-based code supplemented with the ideas of parity duplication and rack awareness that could be adopted in such storage clusters to improve the recovery performance during node failures and compare it with popular implementations of erasure codes, namely Facebook’s Reed-Solomon codes and XORBAS local recovery codes. The code performance along with the proposed ideas are evaluated on a geo-diverse cluster deployed on the NeCTAR research cloud. We also present a scheme for intelligently placing blocks of coded storage depending on the design of the code, inspired by local reconstruction codes. The sum of all these propositions could offer a better solution for applications that are deployed on coded storage systems that are geographically distributed, in which storage constraints make triple replication not affordable, at the same time ensuring minimal recovery time is a strict requirement.
References
Corbett JC, Dean J et al (2012) Spanner: Google’s globally-distributed database. In: Proceedings of the 10th USENIX conference on operating systems design and implementation. OSDI’12. USENIX Association, Berkeley, pp 251–264
Evangelinos C, Hill CN (2008) Cloud computing for parallel scientific HPC applications: feasibility of running coupled atmosphere-ocean climate models on amazon’s EC2. Cloud computing and its applications
Li J, Humphrey M, Agarwal DA, Jackson KR, van Ingen C, Ryu Y. Escience in the cloud: a modis satellite data reprojection and reduction pipeline in the windows azure platform. In: IPDPS. IEEE, pp 1–10
Rashmi KV, Shah NB, Gu D, Kuang H, Borthakur D, Ramchandran K (2014) A “hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. SIGCOMM Comput Commun Rev 44 (4):331–342
Sathiamoorthy M, Asteris M, Papailiopoulos D, Dimakis AG, Vadali R, Chen S, Borthakur D (2013) Xoring elephants: novel erasure codes for big data. In: Proceedings of the VLDB endowment
Xia M, Saxena M, Blaum M, Pease DA (2015) A tale of two erasure codes in hdfs. In: Proceedings of the 13th USENIX conference on file and storage technologies. FAST’15. USENIX Association, Berkeley, pp 213–226
Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S (2012) Erasure coding in windows azure storage. In: Proceedings of the 2012 USENIX conference on annual technical conference. USENIX ATC’12. USENIX Association, Berkeley, pp 2–2
HDFS-RAID. http://wiki.apache.org/hadoop/HDFS-RAID. [Online; Accessed 10-5-2017]
Li R, Lin J, Lee PPC (2013) CORE: augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems. CoRR, 1302.3344
Esmaili KS, Pamies-Juarez L, Datta A (2013) The CORE storage primitive: cross-object redundancy for efficient data repair & access in erasure coded storage. CoRR, 1302.5192
Mohan LJ, Harold RL, Serrano Caneleo PI, Parampalli U, Harwood A (2015) Benchmarking the performance of hadoop triple replication and erasure coding on a nation-wide distributed cloud. In: 2015 International symposium on network coding, NetCod 2015, Sydney, Australia, June 22–24, pp 61–65
NeCTAR. https://www.nectar.org.au. [Online; Accessed 10-5-2017]
Facebook Hadoop-20. https://github.com/facebookarchive/hadoop-20. [Online; Accessed 10-5-2017]
Hadoop-USC. https://github.com/madiator/HadoopUSC. [Online; accessed 10-5-2017]
Blaum M, Brady J, Bruck J, Menon J (1995) Evenodd: an efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans Comput 44(2):192–202
Cadambe VR, Jafar SA, Maleki H, Ramchandran K, Suh C (2013) Asymptotic interference alignment for optimal repair of mds codes in distributed storage. IEEE Trans Inf Theory 59(5):2974–2987
Wang Z, Dimakis AG, Bruck J (2010) Rebuilding for array codes in distributed storage systems. In: 2010 IEEE Globecom Workshops, pp 1905–1909
Xiang L, Xu Y, Lui JCS, Chang Q (2010) Optimal recovery of single disk failure in rdp code storage systems. In: Proceedings of the ACM SIGMETRICS international conference on measurement and modeling of computer systems. SIGMETRICS ’10. ACM, New York, pp 119–130
Huang C, Chen M, Li J (2013) Pyramid codes: flexible schemes to trade space for access efficiency in reliable data storage systems. Trans Storage 9(1):3:1–3:28
Gopalan P, Huang C, Simitci H, Yekhanin S (2012) On the locality of codeword symbols. IEEE Trans Inf Theory 58(11):6925–6934
Kamath GM, Prakash N, Lalitha V, Kumar PV (2014) Codes with local regeneration and erasure correction. IEEE Trans Inf Theory 60(8):4637–4660
Pamies-Juarez L, Hollmann HDL, Oggier F (2013) Locally repairable codes with multiple repair alternatives. In: 2013 IEEE international symposium on information theory, pp 892–896
Silberstein N, Rawat AS, Vishwanath S (2015) Error-correcting regenerating and locally repairable codes via rank-metric codes. IEEE Trans Inf Theory 61(11):5765–5778
Tamo I, Barg A (2014) A family of optimal locally recoverable codes. IEEE Trans Inf Theory 60(8):4661–4676
Gong Q, Wang J, Wei D, Wang J, Wang X (2015) Optimal node selection for data regeneration in heterogeneous distributed storage systems. In: 2015 44th international conference on parallel processing, pp 390–399
Muralidhar S, Lloyd W, Roy S, Hill C, Lin E, Liu W, Pan S, Shankar S, Sivakumar V, Tang L, Kumar S (2014) f4: Facebook’s warm blob storage system. In: 11th USENIX symposium on operating systems design and implementation (OSDI 14). USENIX Association, Broomfield, pp 383–398
Reed IS, Solomon G (1960) Polynomial codes over certain finite fields. J Soc Ind Appl Math 8(2):300–304
Jerasure. http://lab.jerasure.org/jerasure/jerasure. [Online; accessed 10-5-2017]
Ghemawat S, Gobioff H, Leung S-T (2003) The google file system. In: Proceedings of the nineteenth ACM symposium on operating systems principles. SOSP ’03. ACM, New York, pp 29–43
Data Replication- HDFS Architecture guide. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Data+Replication. [Online; Accessed 10-5-2017]
Rashmi KV, Shah NB, Gu D, Kuang H, Borthakur D, Ramchandran K (2013) A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the facebook warehouse cluster. In: Proceedings of the 5th USENIX conference on hot topics in storage and file systems. HotStorage’13. USENIX Association, Berkeley, pp 8–8
Krishnan MN, Prakash N, Lalitha V, Sasidharan B, Kumar PV, Narayanamurthy S, Kumar R, Nandi S (2014) Evaluation of codes with inherent double replication for hadoop. In: 6th USENIX workshop on hot topics in storage and file systems (HotStorage 14). USENIX Association, Philadelphia
Li J, Yang S, Wang X, Li B (2010) Tree-structured data regeneration in distributed storage systems with regenerating codes. In: INFOCOM, 2010 Proceedings IEEE, pp 1–9
Dimakis AG, Ramchandran K, Wu Y, Suh C (2011) A survey on network codes for distributed storage. Proc IEEE 99(3):476–489
Rack awareness. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html. [Online; accessed 10-5-2017]
Acknowledgements
This research was supported by use of the Nectar Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy (NCRIS). This work was supported by Data61/CSIRO and ARC discovery project DP150104473.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Location Awareness
This section briefly introduces the notion of topology awareness with a custom network layout. The Hadoop software framework provides a ready-to-use and flexible implementation of custom network topologies based on bash scripts. It basically uses the rack awareness idea in a data center to implement topology awareness in a cluster. In order to configure any custom geographic topology, some lines should be added to the Hadoop configuration file \(core-site.xml\). The following lines explain the complete process:
The bash script used in our experiments is based on the standard Hadoop topology awareness code, provided at Hadoop Wiki [35]. We modified the code, based on other community references, generating the following code for our geo-diverse cluster:
The above script reads a topology information file “rack_toplology.data” that specifies racks (in our case, locations) and machines in a key-pair relationship using a simple format. Given that our clusters were distributed around Australia, it was natural to organize the different racks using the different locations available on the NeCTAR cloud.
Rights and permissions
About this article
Cite this article
Mohan, L.J., Caneleo, P.I.S., Parampalli, U. et al. Geo-aware erasure coding for high-performance erasure-coded storage clusters. Ann. Telecommun. 73, 139–152 (2018). https://doi.org/10.1007/s12243-017-0623-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12243-017-0623-2