Skip to main content
Log in

Towards a delivery scheme for speedup of data backup in distributed storage systems using erasure codes

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Distributed storage systems, built on peer-to-peer networks, can provide large-scale data storage and high data reliability by redundancy. Data backup is the process to store data into a set of redundant storage nodes. Rapid completion of such a process is very critical to maintain system performance. In traditional data backup in distributed systems based on erasure codes, star-structured scheme is used, in which each redundant block is just sent to each target storage node from the source node directly, so the storage throughput and delay are limited by the bottleneck bandwidth, due to bandwidth heterogeneity. The recent “in-network” redundancy generation scheme uses locally repairable property of self-repairing codes to speed up data backup. However, such kind of code does not own maximum distance separable property, thus does not achieve optimal storage efficiency. We still lack a fast backup scheme in distributed systems based on general erasure coding. To this end, we proposed that instead of only focusing on bandwidths between the source node and target nodes, the bandwidths between target storage nodes should be fully taken into account. In our scheme, each redundant data block is divided into some parts according to different proportions and each part of the block is sent to the target storage node via other different storage nodes. The benefit is that spare bandwidths between target storage nodes are used to reduce backup time. We further show how this process can be modeled and derive a formula about the final backup time. We can achieve minimum backup time by solution for classical quadratic programming problem. We conduct both numerical analysis and experimental study. Our experiments shows, the delay reduces 59 %, compared with common star-structured scheme. Meanwhile, the throughput is increased significantly in backup process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Beaver D, Kumar S, Li HC, Sobel J, Vajgel P (2010) Finding a needle in Haystack: Facebooks photo storage. In: Proceedings of the 9th USENIX conference on operating systems design and implementation (OSDI), vol 10, pp 1–8

  2. EMC (2013) World’s data more than doubling every two years—driving big data opportunity, new IT roles. http://www.emc.com/about/news/press/2011/20110628-01.htm. Accessed 7 Aug 2013

  3. Seagate (2012) IDC says worlds storage is breaking Moores law, more than doubling every two years. http://enterprise.media.seagate.com/2011/06/inside-it-storage/idc-says-worlds-storage-is-breaking-moores-law-more-than-doubling-every-two-years/. Accessed 24 Sept 2012

  4. Ghemawat S, Gobio H, Leung S (2003) The google file system. ACM Sympos Oper Syst Princ 37:29–43

    Article  Google Scholar 

  5. Borthakur D (2015) The Apache Hadoop distributed file system. http://hadoop.apache.org/hdfs/. Accessed 17 Nov 2013

  6. Hastorun D, Jampani M, Kakulapati G et al (2007) Dynamo: Amazons highly available key-value store. In: ACM SIGOPS operating systems review, vol 41, pp 205–220

  7. Lakshman A, Malik P (2011) The Apache Cassandra project. http://cassandra.apache.org/. Accessed 21 May 2013

  8. Liu S, Schulze JP, Herr L, Weekley JD, Zhu B, van Osdol N, Plepys D, Wan M (2011) CineGrid exchange: a workow-based peta-scale distributed storage platform on a high-speed network. Future Gener Comput Syst 27(7):966–976

    Article  Google Scholar 

  9. Kubiatowicz J, Bindel D, Chen Y, Czerwinski S, Eaton P, Geels D, Gummadi R, Rhea S, Weatherspoon H, Weimer W, Wells C, Zhao B (2000) Oceanstore: an architecture for global-scale persistent storage. SIGPLAN Not 35:190–201

    Article  Google Scholar 

  10. Duminuco A, Biersack E (2009) A practical study of regenerating codes for peer-to-peer backup systems. In: Proceedings of the IEEE international conference on distributed computing systems (ICDCS), pp 376–384

  11. Pamies-Juarez L, Datta A, Oggier FE (2013) In-network redundancy generation for opportunistic speedup of data backup. Future Gener Comput Syst 29(6):1353–1362

    Article  Google Scholar 

  12. Pamies-Juarez L, Datta A, Oggier FE (2013) Data insertion and archiving in erasure-coding based large-scale storage systems. In: Proceedings of the international conference on distributed computing and internet technology (ICDCIT), pp 47–68

  13. Li J, Li B (2013) Erasure coding for cloud storage systems: a survey. Tsinghua Sci Technol 18(3):259–272

    Article  Google Scholar 

  14. Reed I, Solomon G (1960) Polynomial codes over certain nite elds. J Soc Ind Appl Math 8:300–304

    Article  Google Scholar 

  15. Acedanski S, Deb S, Medard M, Koetter R (2005) How good is random linear coding based distributed networked storage? In: Proceedings of the 1st workshop on network coding, theory and applications, pp 1–6

  16. Dimakis A, Godfrey P, Wainwright M, Ramchandran K (2007) Network coding for distributed storage systems. In: Proceedings of the INFOCOM, pp 2000–2008

  17. Oggier F, Datta A (2011) Self-repairing homomorphic codes for distributed storage systems. In: Proceedings of the international conference on computer communications (INFOCOM), pp 1215–1223

  18. Lee S-J, Banerjee S, Sharma P, Yalagandula P, Basu S (2008) Bandwidth-aware routing in overlay networks. In: Proceedings of the 27th conference on computer communications, pp 1732–1740

  19. Rustem B, Nguyen Q (1998) An algorithm for the inequality-constrained discrete minimax problem. SIAM J Optim 8:265–283

    Article  MathSciNet  MATH  Google Scholar 

  20. Brayton RK, Director SW, Hachtel GD, Vidigal L (1979) A new algorithm for statistical circuit design based on quasi-Newton methods and function splitting. IEEE Trans Circuits Syst CAS–26:784–794

    Article  Google Scholar 

  21. Fletcher R (2010) The sequential quadratic programming method. Nonlinear optimization. In: Lecture notes in mathematics, pp 165–214

  22. Charalambous C, Conn AR (1978) An efficient method to solve the minimax problem directly. SIAM J Numer Anal 15(1):162–187

    Article  MathSciNet  MATH  Google Scholar 

  23. Planetlab (2015). http://www.planet-lab.org/. Accessed 12 Mar 2012

  24. Lee S-J, Sharma P, Banerjee S, Basu S, Fonseca R (2005) Measuring bandwidth between planetlab nodes. In: Passive and active network measurement, pp 292–305

Download references

Acknowledgments

This research work is supported by National Basic Research Program of China under Grant No. 2014CB340303, and The Program of National Natural Science Foundation of China under Grant No. 61402514 and No. 61402490, and Scientific Research Program of Hunan Provincial Education Department (No. 12b012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengfei You.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

You, P., Huang, Z., Peng, Y. et al. Towards a delivery scheme for speedup of data backup in distributed storage systems using erasure codes. J Supercomput 75, 50–64 (2019). https://doi.org/10.1007/s11227-015-1586-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1586-6

Keywords

Navigation