Skip to main content

Sector Error-Oriented Durability-Aware Fast Repair in Erasure-Coded Cloud Storage Systems

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2021)

Abstract

There is a variety of erasure-coded data placement schemes that make a great contribution to data repair. To repair data, the operator should replace the failed node with a new node first. However, almost all these schemes assume the node replacement process (NRP) is done quickly, which is not true. Generally, NRP includes failure detection and failure repair, which may take hours or even days. Long delay of replacement may cause the recovered data lost again due to the lack of durability. To improve data durability, we propose a novel scheme called Sector Error-Oriented Durability-Aware Fast Repair (SEDRepair), which carefully couples data migration and data reconstruction in parallel for data repair. We conduct mathematical analysis and compute the optimal repair in our model. The results show that, compared to the traditional erasure coding methods, SEDRepair saves the repair time by up to 60% in most cases and improves data durability while keeping minimal storage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blaum, M., Brady, J., Bruck, J., Menon, J.: Evenodd: an efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)

    Article  Google Scholar 

  2. Blömer, J., Kalfane, M., Karp, R., Karpinski, M., Luby, M., Zuckerman, D.: An XOR-based erasure-resilient coding scheme (1995)

    Google Scholar 

  3. Chan, J.C., Ding, Q., Lee, P.P., Chan, H.H.: Parity logging with reserved space: towards efficient updates and recovery in erasure-coded clustered storage. In: 12th \(\{\)USENIX\(\}\) Conference on File and Storage Technologies (\(\{\)FAST\(\}\) 2014), pp. 163–176 (2014)

    Google Scholar 

  4. Emami, T.K.: Partial disk failures and improved storage resiliency, November 2011

    Google Scholar 

  5. Ford, D., et al.: Availability in globally distributed storage systems (2010)

    Google Scholar 

  6. Huang, C., Li, J., Chen, M.: On optimizing XOR-based codes for fault-tolerant storage applications. In: 2007 IEEE Information Theory Workshop, pp. 218–223. IEEE (2007)

    Google Scholar 

  7. Huang, C., et al.: Erasure coding in windows azure storage, p. 2 (2012)

    Google Scholar 

  8. Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57, 889–901 (2008)

    Article  MathSciNet  Google Scholar 

  9. Huang, P., et al.: Gray failure: the achilles’ heel of cloud-scale systems. In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems, pp. 150–155 (2017)

    Google Scholar 

  10. Khan, O., Burns, R., Plank, J.S., Pierce, W., Huang, C.: Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads, p. 20 (2012)

    Google Scholar 

  11. Muralidhar, S., et al.: F4: Facebook’s warm \(\{\)BLOB\(\}\) storage system. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 2014), pp. 383–398 (2014)

    Google Scholar 

  12. Nachiappan, R., Javadi, B., Calheiros, R.N., Matawie, K.M.: Cloud storage reliability for big data applications: a state of the art survey. J. Netw. Comput. Appl. 97, 35–47 (2017)

    Article  Google Scholar 

  13. Ovsiannikov, M., Rus, S., Reeves, D., Sutter, P., Rao, S., Kelly, J.: The quantcast file system. Proc. VLDB Endow. 6(11), 1092–1101 (2013)

    Article  Google Scholar 

  14. Plank, J.S.: The raid-6 liberation code. Int. J. High Perform. Comput. Appl. 23(3), 242–251 (2009)

    Article  Google Scholar 

  15. Plank, J.S., Simmerman, S., Schuman, C.D.: Jerasure: a library in C/C++ facilitating erasure coding for storage applications-version 1.2. University of Tennessee, Technical report, CS-08-627, 23 (2008)

    Google Scholar 

  16. Plank, J.S., Xu, L.: Optimizing cauchy reed-solomon codes for fault-tolerant network storage applications. In: Fifth IEEE International Symposium on Network Computing and Applications (NCA 2006), pp. 173–180. IEEE (2006)

    Google Scholar 

  17. Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960)

    Article  MathSciNet  Google Scholar 

  18. Shen, J., Zhang, K., Gu, J., Zhou, Y., Wang, X.: Efficient scheduling for multi-block updates in erasure coding based storage systems. IEEE Trans. Comput. 67(4), 573–581 (2017)

    Article  MathSciNet  Google Scholar 

  19. Shen, Z., Lee, P.P.C.: Cross-rack-aware updates in erasure-coded data centers, p. 80 (2018)

    Google Scholar 

  20. Shen, Z., Li, X., Lee, P.P.C.: Fast predictive repair in erasure-coded storage, pp. 556–567 (2019)

    Google Scholar 

  21. Vajha, M., et al.: Clay codes: Moulding MDS codes to yield an MSR code. In: 16th USENIX Conference on File and Storage Technologies (FAST 2018), Oakland, CA, pp. 139–154. USENIX Association, February 2018

    Google Scholar 

  22. Wicker, S.B., Bhargava, V.K.: Reed-Solomon Codes and Their Applications. Wiley, Hoboken (1999)

    Book  Google Scholar 

  23. Zhou, T., Tian, C.: Fast erasure coding for data storage: a comprehensive study of the acceleration techniques. ACM Trans. Storage (TOS) 16(1), 1–24 (2020)

    Article  Google Scholar 

Download references

Acknowledgment

We thank the anonymous reviewers for their insightful feedback. We also appreciate Jingwei Li, Zhirong Shen and Hu Xiong for their sincere help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shijie Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiao, Y., Zhou, S., Zhong, L., Zhang, Z. (2021). Sector Error-Oriented Durability-Aware Fast Repair in Erasure-Coded Cloud Storage Systems. In: Tan, Y., Shi, Y., Zomaya, A., Yan, H., Cai, J. (eds) Data Mining and Big Data. DMBD 2021. Communications in Computer and Information Science, vol 1454. Springer, Singapore. https://doi.org/10.1007/978-981-16-7502-7_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-7502-7_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-7501-0

  • Online ISBN: 978-981-16-7502-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics