Skip to main content
Log in

Dynamic erasure coding decision for modern block-oriented distributed storage systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Modern block-oriented distributed storage systems like Hadoop distributed file system have proliferated in this era of big data and cloud computing. These systems feature block-level replication in which their files are partitioned into equal-sized blocks and multiple copies for each block are then arbitrarily distributed across nodes for fault tolerance and data availability. However, many storage volumes are just wasted only for keeping block copies whose data may not be accessed frequently in the strategy. Therefore, distributed storage systems begin to adopt erasure codes. However, classical parity encoding scheme are hard to be directly applied to the distributed storage systems since block copies are arbitrarily placed across nodes in the systems. We present a novel technique, called DynaEC, to address the issues in modern block-oriented distributed storage systems. DynaEC provides a unique parity encoding algorithm that encodes data blocks arbitrarily distributed across machines to parities and then places the parities guaranteeing fault tolerance. Parity encoding in DynaEC is performed without any change of the original block placement policy in Hadoop distributed file system. This makes DynaEC work seamlessly with Hadoop distributed file system. Finally, during the encoding procedure each data node encodes each own data blocks, not requiring any information about other blocks located in other data nodes. As such, the encoding procedure in DynaEC is fully performed in parallel without any synchronization issue. With extensive experiments, we show that DynaEC saves storage volumes up to the theoretical limit while outperforming previous approaches by multiple orders of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Abad CL, Lu Y, Campbell RH (2011) Dare: adaptive data replication for efficient cluster scheduling. In: 2011 IEEE international conference on cluster computing (CLUSTER). IEEE, pp 159–168

  2. André F, Kermarrec AM, Le Merrer E, Le Scouarnec N, Straub G, Van Kempen A (2014) Archiving cold data in warehouses with clustered network coding. In: Proceedings of the ninth European conference on computer systems. ACM, New York, p 21

  3. Baker BS (1985) A new proof for the first-fit decreasing bin-packing algorithm. J Algorithms 6(1):49–70

    Article  MathSciNet  MATH  Google Scholar 

  4. Baker BS, Coffman EG Jr, Rivest RL (1980) Orthogonal packings in two dimensions. SIAM J Comput 9(4):846–855

    Article  MathSciNet  MATH  Google Scholar 

  5. Borthakur D (2007) The hadoop distributed file system: architecture and design. Hadoop Proj Website 11:21

    Google Scholar 

  6. Cheng Z, Luan Z, Meng Y, Xu Y, Qian D, Roy A, Zhang N, Guan G (2012) Erms: an elastic replication management system for hdfs. In: 2012 IEEE international conference on cluster computing workshops (CLUSTER WORKSHOPS). IEEE, pp 32–40

  7. Dimakis AG, Godfrey PB, Wu Y, Wainwright MJ, Ramchandran K (2010) Network coding for distributed storage systems. IEEE Trans Inf Theory 56(9):4539–4551

    Article  Google Scholar 

  8. Dimakis AG, Ramchandran K, Wu Y, Suh C (2011) A survey on network codes for distributed storage. Proc IEEE 99(3):476–489

    Article  Google Scholar 

  9. Fan B, Tantisiriroj W, Xiao L, Gibson G (2009) Diskreduce: raid for data-intensive scalable computing. In: Proceedings of the 4th annual workshop on petascale data storage. ACM, New York, pp 6–10

  10. Fan B, Tantisiriroj W, Xiao L, Gibson G (2011) Diskreduce: replication as a prelude to erasure coding in data-intensive scalable computing. In: SC high performance computing networking, storage and analysis, pp 6–8

  11. Friedman R, Kantor Y, Kantor A (2014) Replicated erasure codes for storage and repair-traffic efficiency. In: 14th IEEE international conference on peer-to-peer computing (P2P). IEEE, pp 1–10

  12. Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: ACM SIGOPS operating systems review, vol 37. ACM, New York, pp 29–43

  13. Gibson G et al (2010) Hdfs-raid wiki. http://wiki.apache.org/hadoop/HDFS-RAID. Accessed Feb 2016

  14. Hafner JL (2005) Weaver codes: highly fault tolerant erasure codes for storage systems. FAST 5:211–224

    MathSciNet  Google Scholar 

  15. Hazan E, Safra S, Schwartz O (2006) On the complexity of approximating \(k\)-set packing. Comput Complex 15(1):20–39

    Article  MathSciNet  MATH  Google Scholar 

  16. Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S et al (2012) Erasure coding in windows azure storage. In: USENIX ATC, vol 12

  17. Hwang K, Jin H, Ho R (2000) Raid-x: a new distributed disk array for i/o-centric cluster computing. In: Proceedings of the ninth international symposium on high-performance distributed computing. IEEE, pp 279–286

  18. Hwang K, Jin H, Ho RS (2002) Orthogonal striping and mirroring in distributed raid for i/o-centric cluster computing. IEEE Trans Parallel Distrib Syst 13(1):26–44

    Article  Google Scholar 

  19. Jiang D, Ooi BC, Shi L, Wu S (2010) The performance of mapreduce: an in-depth study. Proc VLDB Endow 3(1–2):472–483

    Article  Google Scholar 

  20. Kaushik RT, Abdelzaher T, Egashira R, Nahrstedt K (2011) Predictive data and energy management in greenhdfs. In: 2011 international on green computing conference and workshops (IGCC). IEEE, pp 1–9

  21. Kaushik RT, Bhandarkar M (2010) Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In: Proceedings of the USENIX annual technical conference, p 109

  22. Khan O, Burns R, Plank J, Pierce W, Huang C (2012) Rethinking erasure codes for cloud file systems: minimizing i/o for recovery and degraded reads. In: Proc. of USENIX FAST

  23. Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with mapreduce: a survey. SIGMOD Rec 40(4):11–20

    Article  Google Scholar 

  24. MacCormick J, Murphy N, Ramasubramanian V, Wieder U, Yang J, Zhou L (2009) Kinesis: a new approach to replica placement in distributed storage systems. ACM Trans Storage (TOS) 4(4):11

    Google Scholar 

  25. Muralidhar S, Lloyd W, Roy S, Hill C, Lin E, Liu W, Pan S, Shankar S, Sivakumar V, Tang L et al (2014) F4: Facebooks warm blob storage system. In: Proceedings of the 11th USENIX conference on operating systems design and implementation. USENIX Association, pp 383–398

  26. Pamies-Juarez L, Datta A, Oggier F (2013) Rapidraid: pipelined erasure codes for fast data archival in distributed storage systems. In: INFOCOM, 2013 Proceedings IEEE. IEEE, pp 1294–1302

  27. Pamies-Juarez L, Oggier F, Datta A (2013) Decentralized erasure coding for efficient data archival in distributed storage systems. In: Distributed computing and networking. Springer, New York, pp 42–56

  28. Patterson DA, Gibson G, Katz RH (1988) A case for redundant arrays of inexpensive disks (RAID), vol 17. ACM, New York

  29. Plank JS (2009) The raid-6 liber8tion code. Int J High Perform Comput Appl 23(3):242–251

    Article  Google Scholar 

  30. Plank JS, Luo J, Schuman CD, Xu L, Wilcox-O’Hearn Z et al (2009) A performance evaluation and examination of open-source erasure coding libraries for storage. FAST 9:253–265

    Google Scholar 

  31. Plank JS, Simmerman S, Schuman CD (2008) Jerasure: a library in C/C++ facilitating erasure coding for storage applications-version 1.2. University of Tennessee, Tech. Rep. CS-08-627 23

  32. Rashmi K, Shah NB, Gu D, Kuang H, Borthakur D, Ramchandran K (2014) A hitchhiker’s guide to fast and efficient data reconstruction in erasure-coded data centers. In: Proceedings of the 2014 ACM conference on SIGCOMM. ACM, New York, pp 331–342

  33. Sathiamoorthy M, Asteris M, Papailiopoulos D, Dimakis AG, Vadali R, Chen S, Borthakur D (2013) Xoring elephants: novel erasure codes for big data. In: Proceedings of the 39th international conference on very large data bases. VLDB Endowment, pp 325–336

  34. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE, pp 1–10

  35. Stonebraker M, Schloss GA (1990) Distributed raid-a new multiple copy algorithm. In: Proceedings of sixth international conference on data engineering. IEEE, pp 430–437

  36. Wang J, Gong W, Varman P, Xie C (2012) Reducing storage overhead with small write bottleneck avoiding in cloud raid system. In: Proceedings of the 2012 ACM/IEEE 13th international conference on grid computing. IEEE Computer Society, pp 174–183

  37. Weatherspoon H, Kubiatowicz JD (2002) Erasure coding vs. replication: a quantitative comparison. In: Peer-to-peer systems. Springer, New York, pp 328–337

  38. Weil SA, Brandt SA, Miller EL, Long DD, Maltzahn C (2006) Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th symposium on operating systems design and implementation. USENIX Association, pp 307–320

  39. Welch B, Unangst M, Abbasi Z, Gibson GA, Mueller B, Small J, Zelenka J, Zhou B (2008) Scalable performance of the panasas parallel file system. FAST 8:1–17

    Google Scholar 

  40. Xia M, Saxena M, Blaum M, Pease DA (2015) A tale of two erasure codes in hdfs. In: 13th USENIX conference on file and storage technologies (FAST’15). USENIX Association, pp 213–226

  41. Zaman S, Grosu D (2011) A distributed algorithm for the replica placement problem. IEEE Trans Parallel Distrib Syst 22(9):1455–1468

    Article  Google Scholar 

  42. Zhang Z, Deshpande A, Ma X, Thereska E, Narayanan D (2010) Does erasure coding have a role to play in my data center. Microsoft research MSR-TR-2010 52

Download references

Acknowledgments

This work was partly supported by two grants (K-16-L03-C01, R0126-15-1082) funded by the ministry of science, ICT, and future planning, Korea.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyong-Ha Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahn, HY., Lee, KH. & Lee, YJ. Dynamic erasure coding decision for modern block-oriented distributed storage systems. J Supercomput 72, 1312–1341 (2016). https://doi.org/10.1007/s11227-016-1661-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1661-7

Keywords

Navigation