Skip to main content

H-V: An Improved Coding Layout Based onĀ Erasure Coded Storage System

  • Conference paper
  • First Online:
Database Systems for Advanced Applications. DASFAA 2022 International Workshops (DASFAA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13248))

Included in the following conference series:

Abstract

The failure of a single unreliable commodity components is very common in large-scale distributed storage systems. In order to ensure the reliability of data in large-scale distributed storage systems, a lot of studies have emerged one after another. Among them, Erasure Codes are widely used in actual storage systems, such as Hadoop Distributed File System (HDFS), to provide high fault tolerance with lower storage overhead. However, usually the recovery of erasure-coded storage system when encountering a node failure will result in severe cross-node and cross-rack bandwidth loss, which affects the efficiency of failure recovery and wastes additional resources. In this paper, we improve the erasure code storage strategy in Hadoop 3.x, propose H\(^{RS(n,k)}\) - V\(^{RS(n',k')}\) abbreviated as H-V, and add RS parity check inside the data nodes, effectively reduce cross-node and cross-rack data transmission during recovery, reduce the occupation of cross-rack bandwidth, and improve recovery efficiency. Theoretical analysis shows that compared with traditional RS erasure code storage, H-V can reduce the cross-node and cross-rack bandwidth of RS by at least 25% and respectively during data recovery. 62.5%; compared with D\(^3\), H-V reduces the storage redundancy by up to 19.7% while reducing the cross-node and cross-rack bandwidth of D\(^3\) by 25% and 12.5% during data recovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. ACM SIGOPS Oper. Syst. Rev. 37(5), 29ā€“43 (2003)

    ArticleĀ  Google ScholarĀ 

  2. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies (2010)

    Google ScholarĀ 

  3. Facebook: HDFS-RAID (2011). https://wiki.apache.org/hadoop/HDFS-RAID

  4. Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300ā€“304 (1960)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  5. Benson, T., Akella, A., Maltz, D.A.: Network traffic characteristics of data centers in the wild. In: Proceedings of 10th ACM SIGCOMM Conference on Internet Measurement, pp. 267ā€“280 (2010)

    Google ScholarĀ 

  6. Li, R., Hu, Y., Lee, P.P.: Enabling efficient and reliable transition from replication to erasure coding for clustered file systems. IEEE Trans. Parallel Distrib. Syst. 28(9), 2500ā€“2513 (2017)

    ArticleĀ  Google ScholarĀ 

  7. Liu, S., Duan, D.: An improved method for HDFS replica recovery: based on SVM algorithm. In: Proceedings of the 2020 4th International Conference on Cloud and Big Data Computing (2020)

    Google ScholarĀ 

  8. Zhao, W., Cui, X.: A fast adaptive replica recovery algorithm based on access frequency and environment awareness. In: Proceedings of the 2020 4th International Conference on Cloud and Big Data Computing (2020)

    Google ScholarĀ 

  9. Wu, S., Zhu, W., Mao, B., Li, K.C.: PP: popularity-based proactive data recovery for HDFS-RAID systems. Future Gener. Comput. Syst. 86(SEP), 1146ā€“1153 (2017)

    Google ScholarĀ 

  10. Tai, Z., et al.: STORE: data recovery with approximate minimum network bandwidth and disk I/O in distributed storage systems. In: IEEE International Conference on Big Data. IEEE (2015)

    Google ScholarĀ 

  11. Li, R., Jian, L., Lee, P.: CORE: augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems. IEEE (2013)

    Google ScholarĀ 

  12. Xie, X., et al.: AZ-code: an efficient availability zone level erasure code to provide high fault tolerance in cloud storage systems. In: 2019 35th Symposium on Mass Storage Systems and Technologies (MSST) (2019)

    Google ScholarĀ 

  13. Xia, M., et al.: A tale of two erasure codes in HDFS. In: Usenix Conference on File & Storage Technologies USENIX Association (2015)

    Google ScholarĀ 

  14. Xu, L., et al.: Deterministic data distribution for efficient recovery in erasure-coded storage systems. IEEE Trans. Parallel Distrib. Syst. 31(10), 2248ā€“2262 (2020)

    ArticleĀ  Google ScholarĀ 

  15. Caneleo, P., et al.: On improving recovery performance in erasure code based geo-diverse storage clusters. In: International Conference on the Design of Reliable Communication Networks. IEEE (2016)

    Google ScholarĀ 

Download references

Acknowledgements

The research was supported in part by the National Natural Science Foundation of China (Grant No. 61872043), Qin Xin Talents Cultivation Program, Beijing Information Science & Technology University (No. QXTCP B201904), State Key Laboratory of Computer Architecture (ICT, CAS) under Grant No. CARCHA202103, the key scientific and technological projects of Henan Province (Grant No. 202102210174), and the Key Scientific Research Projects of Henan Higher School (Grant No. 19A520043).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mu, T., Song, Y., Yang, M., Wang, B., Zhao, J. (2022). H-V: An Improved Coding Layout Based onĀ Erasure Coded Storage System. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11217-1_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11216-4

  • Online ISBN: 978-3-031-11217-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics