Skip to main content
Log in

Geo-aware erasure coding for high-performance erasure-coded storage clusters

  • Published:
Annals of Telecommunications Aims and scope Submit manuscript

Abstract

Erasure code-based distributed storage systems are increasingly being used by storage providers for big data storage since they offer the same reliability as replication with a significant decrease in the amount of storage required. But, when it comes to a storage system with data nodes spread across a very large geographical area, the node’s recovery performance is affected by various factors that are both network and computation related. In this paper, we present a XOR-based code supplemented with the ideas of parity duplication and rack awareness that could be adopted in such storage clusters to improve the recovery performance during node failures and compare it with popular implementations of erasure codes, namely Facebook’s Reed-Solomon codes and XORBAS local recovery codes. The code performance along with the proposed ideas are evaluated on a geo-diverse cluster deployed on the NeCTAR research cloud. We also present a scheme for intelligently placing blocks of coded storage depending on the design of the code, inspired by local reconstruction codes. The sum of all these propositions could offer a better solution for applications that are deployed on coded storage systems that are geographically distributed, in which storage constraints make triple replication not affordable, at the same time ensuring minimal recovery time is a strict requirement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

References

  1. Corbett JC, Dean J et al (2012) Spanner: Google’s globally-distributed database. In: Proceedings of the 10th USENIX conference on operating systems design and implementation. OSDI’12. USENIX Association, Berkeley, pp 251–264

  2. Evangelinos C, Hill CN (2008) Cloud computing for parallel scientific HPC applications: feasibility of running coupled atmosphere-ocean climate models on amazon’s EC2. Cloud computing and its applications

  3. Li J, Humphrey M, Agarwal DA, Jackson KR, van Ingen C, Ryu Y. Escience in the cloud: a modis satellite data reprojection and reduction pipeline in the windows azure platform. In: IPDPS. IEEE, pp 1–10

  4. Rashmi KV, Shah NB, Gu D, Kuang H, Borthakur D, Ramchandran K (2014) A “hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. SIGCOMM Comput Commun Rev 44 (4):331–342

    Article  Google Scholar 

  5. Sathiamoorthy M, Asteris M, Papailiopoulos D, Dimakis AG, Vadali R, Chen S, Borthakur D (2013) Xoring elephants: novel erasure codes for big data. In: Proceedings of the VLDB endowment

  6. Xia M, Saxena M, Blaum M, Pease DA (2015) A tale of two erasure codes in hdfs. In: Proceedings of the 13th USENIX conference on file and storage technologies. FAST’15. USENIX Association, Berkeley, pp 213–226

  7. Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S (2012) Erasure coding in windows azure storage. In: Proceedings of the 2012 USENIX conference on annual technical conference. USENIX ATC’12. USENIX Association, Berkeley, pp 2–2

  8. HDFS-RAID. http://wiki.apache.org/hadoop/HDFS-RAID. [Online; Accessed 10-5-2017]

  9. Li R, Lin J, Lee PPC (2013) CORE: augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems. CoRR, 1302.3344

  10. Esmaili KS, Pamies-Juarez L, Datta A (2013) The CORE storage primitive: cross-object redundancy for efficient data repair & access in erasure coded storage. CoRR, 1302.5192

  11. Mohan LJ, Harold RL, Serrano Caneleo PI, Parampalli U, Harwood A (2015) Benchmarking the performance of hadoop triple replication and erasure coding on a nation-wide distributed cloud. In: 2015 International symposium on network coding, NetCod 2015, Sydney, Australia, June 22–24, pp 61–65

  12. NeCTAR. https://www.nectar.org.au. [Online; Accessed 10-5-2017]

  13. Facebook Hadoop-20. https://github.com/facebookarchive/hadoop-20. [Online; Accessed 10-5-2017]

  14. Hadoop-USC. https://github.com/madiator/HadoopUSC. [Online; accessed 10-5-2017]

  15. Blaum M, Brady J, Bruck J, Menon J (1995) Evenodd: an efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans Comput 44(2):192–202

    Article  MATH  Google Scholar 

  16. Cadambe VR, Jafar SA, Maleki H, Ramchandran K, Suh C (2013) Asymptotic interference alignment for optimal repair of mds codes in distributed storage. IEEE Trans Inf Theory 59(5):2974–2987

    Article  MathSciNet  MATH  Google Scholar 

  17. Wang Z, Dimakis AG, Bruck J (2010) Rebuilding for array codes in distributed storage systems. In: 2010 IEEE Globecom Workshops, pp 1905–1909

  18. Xiang L, Xu Y, Lui JCS, Chang Q (2010) Optimal recovery of single disk failure in rdp code storage systems. In: Proceedings of the ACM SIGMETRICS international conference on measurement and modeling of computer systems. SIGMETRICS ’10. ACM, New York, pp 119–130

  19. Huang C, Chen M, Li J (2013) Pyramid codes: flexible schemes to trade space for access efficiency in reliable data storage systems. Trans Storage 9(1):3:1–3:28

    Article  Google Scholar 

  20. Gopalan P, Huang C, Simitci H, Yekhanin S (2012) On the locality of codeword symbols. IEEE Trans Inf Theory 58(11):6925–6934

    Article  MathSciNet  MATH  Google Scholar 

  21. Kamath GM, Prakash N, Lalitha V, Kumar PV (2014) Codes with local regeneration and erasure correction. IEEE Trans Inf Theory 60(8):4637–4660

    Article  MathSciNet  MATH  Google Scholar 

  22. Pamies-Juarez L, Hollmann HDL, Oggier F (2013) Locally repairable codes with multiple repair alternatives. In: 2013 IEEE international symposium on information theory, pp 892–896

  23. Silberstein N, Rawat AS, Vishwanath S (2015) Error-correcting regenerating and locally repairable codes via rank-metric codes. IEEE Trans Inf Theory 61(11):5765–5778

    Article  MathSciNet  MATH  Google Scholar 

  24. Tamo I, Barg A (2014) A family of optimal locally recoverable codes. IEEE Trans Inf Theory 60(8):4661–4676

    Article  MathSciNet  MATH  Google Scholar 

  25. Gong Q, Wang J, Wei D, Wang J, Wang X (2015) Optimal node selection for data regeneration in heterogeneous distributed storage systems. In: 2015 44th international conference on parallel processing, pp 390–399

  26. Muralidhar S, Lloyd W, Roy S, Hill C, Lin E, Liu W, Pan S, Shankar S, Sivakumar V, Tang L, Kumar S (2014) f4: Facebook’s warm blob storage system. In: 11th USENIX symposium on operating systems design and implementation (OSDI 14). USENIX Association, Broomfield, pp 383–398

  27. Reed IS, Solomon G (1960) Polynomial codes over certain finite fields. J Soc Ind Appl Math 8(2):300–304

    Article  MathSciNet  MATH  Google Scholar 

  28. Jerasure. http://lab.jerasure.org/jerasure/jerasure. [Online; accessed 10-5-2017]

  29. Ghemawat S, Gobioff H, Leung S-T (2003) The google file system. In: Proceedings of the nineteenth ACM symposium on operating systems principles. SOSP ’03. ACM, New York, pp 29–43

  30. Data Replication- HDFS Architecture guide. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Data+Replication. [Online; Accessed 10-5-2017]

  31. Rashmi KV, Shah NB, Gu D, Kuang H, Borthakur D, Ramchandran K (2013) A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the facebook warehouse cluster. In: Proceedings of the 5th USENIX conference on hot topics in storage and file systems. HotStorage’13. USENIX Association, Berkeley, pp 8–8

  32. Krishnan MN, Prakash N, Lalitha V, Sasidharan B, Kumar PV, Narayanamurthy S, Kumar R, Nandi S (2014) Evaluation of codes with inherent double replication for hadoop. In: 6th USENIX workshop on hot topics in storage and file systems (HotStorage 14). USENIX Association, Philadelphia

  33. Li J, Yang S, Wang X, Li B (2010) Tree-structured data regeneration in distributed storage systems with regenerating codes. In: INFOCOM, 2010 Proceedings IEEE, pp 1–9

  34. Dimakis AG, Ramchandran K, Wu Y, Suh C (2011) A survey on network codes for distributed storage. Proc IEEE 99(3):476–489

    Article  Google Scholar 

  35. Rack awareness. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html. [Online; accessed 10-5-2017]

Download references

Acknowledgements

This research was supported by use of the Nectar Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy (NCRIS). This work was supported by Data61/CSIRO and ARC discovery project DP150104473.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lakshmi J. Mohan.

Appendix

Appendix

1.1 Location Awareness

This section briefly introduces the notion of topology awareness with a custom network layout. The Hadoop software framework provides a ready-to-use and flexible implementation of custom network topologies based on bash scripts. It basically uses the rack awareness idea in a data center to implement topology awareness in a cluster. In order to configure any custom geographic topology, some lines should be added to the Hadoop configuration file \(core-site.xml\). The following lines explain the complete process:

figure a

The bash script used in our experiments is based on the standard Hadoop topology awareness code, provided at Hadoop Wiki [35]. We modified the code, based on other community references, generating the following code for our geo-diverse cluster:

figure b

The above script reads a topology information file “rack_toplology.data” that specifies racks (in our case, locations) and machines in a key-pair relationship using a simple format. Given that our clusters were distributed around Australia, it was natural to organize the different racks using the different locations available on the NeCTAR cloud.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohan, L.J., Caneleo, P.I.S., Parampalli, U. et al. Geo-aware erasure coding for high-performance erasure-coded storage clusters. Ann. Telecommun. 73, 139–152 (2018). https://doi.org/10.1007/s12243-017-0623-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12243-017-0623-2

Keywords

Navigation