Scalable local reconstruction code design for hot data reads in cloud storage systems

Zhang, Zhikai; Gu, Shushi; Zhang, Qinyu

doi:10.1007/s11432-021-3421-6

Scalable local reconstruction code design for hot data reads in cloud storage systems

Research Paper
Published: 22 November 2022

Volume 65, article number 222303, (2022)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Zhikai Zhang¹,
Shushi Gu^1,2 &
Qinyu Zhang^1,2

127 Accesses
Explore all metrics

Abstract

Since demand for data is significantly heterogeneous in cloud storage systems (CSSs), there is traffic congestion in nodes storing hot data. In erasure-coded CSSs, traffic congestion can be alleviated by degraded reads sacrificing the bandwidth of surviving nodes. Local reconstruction codes (LRCs) reduce the bandwidth consumption of degraded reads, but cannot provide skewed throughput gain for the hot data. In this paper, we propose a scalable local reconstruction code (SLRC) that relies on LRCs but is more flexible in improving the throughput of a specific data block. First, we develop the local maximum throughput (LMT) to measure the maximum throughput of the hot data blocks by analyzing the actual read arrival rate of LRCs. Further, we elaborate on the structure of SLRC and analyze their performance metrics, which include storage overhead, reconstruction cost, and LMT. To select the appropriate code, we present the minimum reconstruction cost, minimum storage overhead, and minimum penalty algorithms. Finally, we implement extensive experiments on several typical SLRCs on the Hadoop distributed file system. Higher LMT and lower bandwidth consumption can be provided by SLRCs for hot data block degraded reads in CSSs compared with RS codes and LRCs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A study of the performance of novel storage-centric repairable codes

Article 29 July 2015

Adaptive Bandwidth-Efficient Recovery Techniques in Erasure-Coded Cloud Storage

Storage and repair bandwidth tradeoff for heterogeneous cluster distributed storage systems

Article 15 January 2020

References

Ananthanarayanan G, Agarwal S, Kandula S, et al. Scarlett: coping with skewed content popularity in MapReduce cluster. In: Proceedings of ACM SIGOPS/EuroSys European Conference Computer Systems, 2011. 287–300
Tan X Y, Guo Y C, Chen Y S, et al. Accurate inference of user popularity preference in a large-scale online video streaming system. Sci China Inf Sci, 2018, 61: 018101
Article Google Scholar
André F, Kermarrec A, Merrer E L, et al. Archiving cold data in warehouses with clustered network coding. In: Proceedings of ACM SIGOPS/EuroSys European Conference Computer Systems, 2014. 1–14
Ghosh M, Raina A, Xu L, et al. Popular is cheaper: curtailing memory costs in interactive analytics engines. In: Proceedings of ACM SIGOPS/EuroSys European Conference Computer Systems, 2018. 1–14
Hu D, Feng D, Xie Y, et al. Efficient provenance management via clustering and hybrid storage in big data environments. IEEE Trans Big Data, 2020, 6: 792–803
Article Google Scholar
Balakrishnan S, Black R, Donnelly A, et al. Pelican: a building block for exascale cold data storage. In: Proceedings of USENIX Conference Operating Systems Design and Implementation, 2014. 351–365
Schroeder B, Gibson G A. Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you? In: Proceedings of the 5th USENIX Conference File Storage Technology, 2007. 1–16
Ford D, Labelle F, Popovici F I, et al. Availability in globally distributed storage systems. In: Proceedings of USENIX Conference Operating Systems Design and Implementation, 2010. 61–74
Fang J, Wan S, Huang P, et al. Early identification of critical blocks: making replicated distributed storage systems reliable against node failures. IEEE Trans Parallel Distrib Syst, 2018, 29: 2446–2459
Article Google Scholar
Calder B, Wang J, Ogus A, et al. Windows azure storage: a highly available cloud storage service with strong consistency. In: Proceedings of ACM Symposium on Operating Systems Principles, 2011. 143–157
Ghemawat S, Gobioff H, Leung S T. The Google file system. In: Proceedings of ACM Symposium on Operating Systems Principles, 2003. 29–43
Abulibdeh H, Princehouse L, Weatherspoon H. RACS: a case for cloud storage diversity. In: Proceedings of the 1st ACM Symposium on Cloud Computing, 2010. 229–240
Muralidhar S, Lloyd W, Roy S, et al. F4: Facebook’s Warm BLOB storage system. In: Proceedings of USENIX Conference Operating Systems Design and Implementation, 2014. 383–398
Wang J Z, Luo Y, Shum K W. Storage and repair bandwidth tradeoff for heterogeneous cluster distributed storage systems. Sci China Inf Sci, 2020, 63: 122301
Article MathSciNet Google Scholar
Zhou L Y, Zhang Z F. Explicit construction of minimum bandwidth rack-aware regenerating codes. Sci China Inf Sci, 2021. doi: https://doi.org/10.1007/s11432-021-3304-6
Balaji S B, Krishnan M N, Vajha M, et al. Erasure coding for distributed storage: an overview. Sci China Inf Sci, 2018, 61: 100301
Article Google Scholar
Hou H X, Han Y S. A class of binary MDS array codes with asymptotically weak-optimal repair. Sci China Inf Sci, 2018, 61: 100302
Article MathSciNet Google Scholar
Huang C, Simitci H, Xu Y K, et al. Erasure coding in windows azure storage. In: Proceedings of USENIX Annual Technical Conference, 2012. 2
Dimakis A G, Godfrey P B, Wu Y, et al. Network coding for distributed storage systems. IEEE Trans Inform Theor, 2010, 56: 4539–4551
Article Google Scholar
Tang X, Yang B, Li J, et al. A new repair strategy for the Hadamard minimum storage regenerating codes for distributed storage systems. IEEE Trans Inform Theor, 2015, 61: 5271–5279
Article MathSciNet Google Scholar
Yang B, Tang X, Li J. A systematic piggybacking design for minimum storage regenerating codes. IEEE Trans Inform Theor, 2015, 61: 5779–5786
Article MathSciNet Google Scholar
Li J, Tang X, Parampalli U. A framework of constructions of minimal storage regenerating codes with the optimal access/update property. IEEE Trans Inform Theor, 2015, 61: 1920–1932
Article MathSciNet Google Scholar
Rashmi K, Shah N B, Gu D K, et al. A hitchhiker’s guide to fast and efficient data reconstruction in erasure-coded data centers. In: Proceedings of ACM Conference SIGCOMM, 2014. 331–342
Shen Z, Lee P P C, Shu J, et al. Encoding-aware data placement for efficient degraded reads in xor-coded storage systems: algorithms and evaluation. IEEE Trans Parallel Distrib Syst, 2018, 29: 2757–2770
Article Google Scholar
Zhu Y, Lin J, Lee P P C, et al. Boosting degraded reads in heterogeneous erasure-coded storage systems. IEEE Trans Comput, 2015, 64: 2145–2157
Article MathSciNet Google Scholar
Shen Z, Shu J, Fu Y. HV Code: an all-around MDS code for RAID-6 storage systems. IEEE Trans Parallel Distrib Syst, 2015, 27: 1674–1686
Article Google Scholar
Li R, Lee P P C, Hu Y. Degraded-first scheduling for map-reduce in erasure-coded storage clusters. In: Proceedings of the 44th Annual IEEE/IFIP International Conference Dependable Systems and Networks, 2014. 419–430
Khan O, Burns R, Plank J S, et al. Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, 2012. 251–264
Fu Y, Shu J, Shen Z. EC-FRM: an erasure coding framework to speed up reads for erasure coded cloud storage systems. In: Proceedings of 2015 44th International Conference on Parallel Processing Workshops, 2015. 480–489
Aggarwal V, Chen Y F R, Lan T, et al. Sprout: a functional caching approach to minimize service latency in erasure-coded storage. IEEE/ACM Trans Networking, 2017, 25: 3683–3694
Article Google Scholar
Zhang X J, Cai Y, Liu Y F, et al. NADE: nodes performance awareness and accurate distance evaluation for degraded read in heterogeneous distributed erasure code-based storage. J Supercomput, 2020, 76: 4946–4975
Article Google Scholar
Chowdhury M, Kandula S, Stoica I. Leveraging endpoint flexibility in data-intensive clusters. In: Proceedings of ACM SIGCOMM Conference, 2013. 231–242
Fu M, Feng D, Hua Y, et al. Reducing fragmentation for in-line deduplication backup storage via exploiting backup history and cache knowledge. IEEE Trans Parallel Distrib Syst, 2016, 27: 855–868
Article Google Scholar
Li P, Jin X T, Stones R J, et al. Parallelizing degraded read for erasure coded cloud storage systems using collective communications. In: Proceedings of IEEE Trustcom/BigDataSE/ISPA, 2016. 1272–1279
Nachiappan R, Javadi B, Calheiros R N, et al. ProactiveCache: on reducing degraded read latency of erasure coded. In: Proceedings of IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 2019. 223–230
Shuai Q, Li V O K. Delay performance of direct reads in distributed storage systems with coding. In: Proceedings of IEEE 17th International Conference on High Performance Computing and Communication, 2015. 184–189
Lee K, Shah N B, Huang L, et al. The MDS queue: analysing the latency performance of erasure codes. IEEE Trans Inform Theor, 2017, 63: 2822–2842
MathSciNet Google Scholar
Hu Y, Liu Y, Li W, et al. Unequal failure protection coding technique for distributed cloud storage systems. IEEE Trans Cloud Comput, 2021, 9: 386–400
Article Google Scholar
Wei B, Xiao L M, Wei W, et al. A new adaptive coding selection method for distributed storage systems. IEEE Access, 2018, 6: 13350–13357
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Sciences Foundation of China (Grant Nos. 61831008, 62027802, 62271165), Guangdong Science and Technology Planning Project (Grant No. 2021A1515011572), Shenzhen Natural Science Fund (Grant No. JCYJ20200109112822953), Shenzhen Natural Science Fund (Stable Support Plan Program) (Grant No. GXWD20201230155427003-20200824081029001), and Major Key Project of PCL (Grant No. PCL2021A03-1).

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
Zhikai Zhang, Shushi Gu & Qinyu Zhang
Peng Cheng Laboratory, Shenzhen, 518055, China
Shushi Gu & Qinyu Zhang

Authors

Zhikai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shushi Gu
View author publications
You can also search for this author in PubMed Google Scholar
Qinyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shushi Gu or Qinyu Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Gu, S. & Zhang, Q. Scalable local reconstruction code design for hot data reads in cloud storage systems. Sci. China Inf. Sci. 65, 222303 (2022). https://doi.org/10.1007/s11432-021-3421-6

Download citation

Received: 19 August 2021
Revised: 26 November 2021
Accepted: 26 January 2022
Published: 22 November 2022
DOI: https://doi.org/10.1007/s11432-021-3421-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable local reconstruction code design for hot data reads in cloud storage systems

Abstract

Access this article

Similar content being viewed by others

A study of the performance of novel storage-centric repairable codes

Adaptive Bandwidth-Efficient Recovery Techniques in Erasure-Coded Cloud Storage

Storage and repair bandwidth tradeoff for heterogeneous cluster distributed storage systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scalable local reconstruction code design for hot data reads in cloud storage systems

Abstract

Access this article

Similar content being viewed by others

A study of the performance of novel storage-centric repairable codes

Adaptive Bandwidth-Efficient Recovery Techniques in Erasure-Coded Cloud Storage

Storage and repair bandwidth tradeoff for heterogeneous cluster distributed storage systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation