HDS: optimizing data migration and parity update to realize RAID-6 scaling for HDP

Yuan, Zhu; You, Xindong; Lv, Xueqiang; Li, Muyuan; Xie, Ping

doi:10.1007/s10586-021-03379-0

HDS: optimizing data migration and parity update to realize RAID-6 scaling for HDP

Published: 08 August 2021

Volume 24, pages 3815–3835, (2021)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Zhu Yuan^1,2,3,4,5,
Xindong You³,
Xueqiang Lv^2,3,
Muyuan Li^1,2,4,5 &
…
Ping Xie ORCID: orcid.org/0000-0001-9122-8534^1,2,4,5

168 Accesses
3 Citations
Explore all metrics

Abstract

The overload of data always threatens the reliability of storage system. The RAID-6 storage system provides higher reliability and flexible scalability. RAID-6 scaling can rapidly relieve the insufficient storage capacity in a short time. Therefore, this paper proposes Horizontal Data migration Scaling (HDS), an efficient RAID-6 scaling scheme, for HDP Code. First, it only migrates a small amount of data from the old disk to the new disk to regain I/O load balancing among all disks including old and new. Second, it optimizes the update order of anti-diagonal parity data to reduce the cost of parity data update. By numerical results and real experimental data analysis, this paper compares the performance of HDS to Round-Robin and Semi-RR. Compared with Round-Robin and Semi-RR, the final analysis results indicate: (1) HDS reduces the data migration by 59.9 \(\sim\) 83.3%; (2) HDS decreases the total cost of XOR operations by 36.84 \(\sim\) 71.43% and 66.04 \(\sim\) 76.92%; (3) HDS improves the total scaling time by 43.78 \(\sim\) 61.83% and 16.39 \(\sim\) 48.89% under offline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HS6: An Efficient H-Code RAID-6 Scaling by Optimizing Data Migrating and Parity Updating

Article 16 April 2021

Nscale: an efficient RAID-6 online scaling via optimizing data migration

Article 14 August 2022

An efficient data layout scheme for better I/O balancing in RAID-6 storage systems

Article 13 May 2015

Data availibility

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Pinheiro, E., Weber, W.-D., Barroso, L.A.: Failure trends in a large disk drive population. In: USENIX Conference on File and Storage Technologies (FAST), San Jose, 13–16 Feb 2007, pp. 17–23. USENIX
Gibson, B.S.G.A.: Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you? In: USENIX Conference on File and Storage Technologies (FAST), San Jose, 13–16 Feb 2007, pp. 1–16. USENIX
Wu, S., Yi, Y., Xiao, J., Jin, H., Ye, M.: A large-scale study of I/O workload’s impact on disk failure. IEEE Access 6, 47385–47396 (2018)
Weatherspoon, H., Kubiatowicz, J.D.: Erasure coding vs. replication: a quantitative comparison. In: International Workshop on Peer-to-Peer Systems, Cambridge, 7–8 Mar 2002, pp. 328–337. Springer
Mohan, L.J., Harold, R.L., Caneleo, P.I.S., Parampalli, U., Harwood, A.: Benchmarking the performance of hadoop triple replication and erasure coding on a nation-wide distributed cloud. In: 2015 international symposium on Network Coding (NetCod), Sydney, 22–24 June 2015, pp. 61–65. IEEE
Ahn, C., Pirahandeh, M., Kim, D.-H.: Dynamic allocation of replication and erasure codes for enhancing storage efficiency in OpenStack swift. In: 2020 International Conference on Electronics, Information, and Communication (ICEIC), Barcelona, 19–22 Jan 2020, pp. 1–2. IEEE
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, Bolton Landing, 19–22 Oct 2003, pp. 29–43
Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11(2007), 1–10 (2007)
Google Scholar
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)
Rodeh, O.: The Write-Anywhere-File-Layout (WAFL) (2014)
Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S.: Erasure coding in windows azure storage. In: Presented as Part of the 2012 USENIX Annual Technical Conference (ATC), Boston, 13–15 June 2012, pp. 15–26. USENIX
Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W.: Oceanstore: an architecture for global-scale persistent storage. ACM SIGOPS Oper. Syst. Rev. 34(5), 190–201 (2000)
Article Google Scholar
Rizzo, L.: Effective erasure codes for reliable computer communication protocols. ACM SIGCOMM Comput. Commun. Rev. 27(2), 24–36 (1997)
Article Google Scholar
Plank, J.S.: T1: erasure codes for storage applications. In: Proc. of the 4th USENIX Conference on File and Storage Technologies, San Francisco, 13–16 Dec 2005, pp. 1–74. USENIX
Xianghong, L., Jiwu, S.: Summary of research for erasure code in storage system. J. Comput. Res. Dev. 49(1), 1–11 (2012)
Google Scholar
Patterson, D.A., Gibson, G., Katz, R.H.: A case for redundant arrays of inexpensive disks (RAID). In: ACM SIGMOD Chicago, Illinois, 1–3 June 1988, pp. 109–116. ACM
Blaum, M., Brady, J., Bruck, J., Menon, J.: EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)
Article Google Scholar
Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., Sankar, S.: Row-diagonal parity for double disk failure correction. In: File and Storage Technologies (FAST), San Francisco, California, 31 Mar-2 April 2004, pp. 1–14. USENIX
Blaum, M.: A family of MDS array codes with minimal number of encoding operations. In: IEEE International Symposium on Information Theory, The Westin Seattle, 9–14 July 2006, pp. 2784–2788. IEEE
Plank, J.S.: The RAID-6 liberation codes. In: USENIX Conference on File and Storage Technologies (FAST), San Jose, 26–29 Feb 2008, pp. 97–110. USENIX
Plank, J.S.: The raid-6 liber8tion code. Int. J. High Perform. Comput. Appl. 23(3), 242–251 (2009)
Article Google Scholar
Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57(7), 889–901 (2008)
Article MathSciNet Google Scholar
Xu, L., Bruck, J.: X-code: MDS array codes with optimal encoding. IEEE Trans. Inf. Theory 45(1), 272–276 (1999)
Article MathSciNet Google Scholar
Wu, C., Wan, S., He, X., Cao, Q., Xie, C.: H-Code: A hybrid MDS array code to optimize partial stripe writes in RAID-6. In: IEEE International Parallel and Distributed Processing Symposium, Anchorage, Alaska, 16–20 May 2011, pp. 782–793. IEEE
Fu, Y., Shu, J., Luo, X., Shen, Z., Hu, Q.: Short code: an efficient RAID-6 MDS code for optimizing degraded reads and partial stripe writes. IEEE Trans. Comput. 66(1), 127–137 (2016)
Article MathSciNet Google Scholar
Shen, Z., Shu, J.: Hv code: An all-around mds code to improve efficiency and reliability of raid-6 systems. In: IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA, 23–26 June 2014, pp. 550–561. IEEE Computer Society
Wu, C., He, X., Wu, G., Wan, S., Liu, X., Cao, Q., Xie, C.: HDP code: a horizontal-diagonal parity code to optimize i/o load balancing in raid-6. In: IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Hong Kong, 27–30 June 2011, pp. 209–220. IEEE Computer Society
Fu, Y., Shu, J.: D-Code: an efficient RAID-6 code to optimize I/O loads and read performance. In: IEEE International Parallel and Distributed Processing Symposium, Hyderabad, 25–29 May 2015, pp. 603–612. IEEE Computer Society
Xu, L., Bohossian, V., Bruck, J., Wagner, D.G.: Low-density MDS codes and factors of complete graphs. IEEE Trans. Inf. Theory 45(6), 1817–1826 (1999)
Article MathSciNet Google Scholar
Hafner, J.L.: HoVer erasure codes for disk arrays. In: International Conference on Dependable Systems and Networks (DSN’06), Philadelphia, 25–28 June 2006, pp. 217–226. IEEE Computer Society
Hafner, J.L.: WEAVER Codes: Highly Fault Tolerant Erasure Codes for Storage Systems. In: Fast, San Francisco, 13–16 Dec 2005, pp. 16–16
Jin, C., Jiang, H., Feng, D., Tian, L.: P-Code: A new RAID-6 code with optimal properties. In: the 23rd International Conference on Supercomputing, Yorktown Heights, NY, 8–12 June 2009, pp. 360–369. ACM
Xie, P., Huang, J., Cao, Q., Xie, C.: Balanced p-code: a raid-6 code to support highly balanced i/os for disk arrays. In: IEEE International Conference on Networking, Architecture, and Storage, Tianjin, 6–8 Aug 2014, pp. 133–137. IEEE Computer Society
Xie, P., Yuan, Z., Huang, J., Qin, X.: N-code: an optimal RAID-6 MDS array code for load balancing and high I/O performance. In: The 48th International Conference on Parallel Processing, Kyoto, 5–8 Aug 2019, pp. 34:31–34:10. ACM
Yuan, Z., XIE, P., GENG, S.: Summary of research for RAID system scaling schemes. Acta Electron. Sin. 47(11), 2420–2431 (2019)
Zhang, G., Shu, J., Xue, W., Zheng, W.: SLAS: an efficient approach to scaling round-robin striped volumes. ACM Trans. Stor. (TOS) 3(1), 3:1–3:29 (2007)
Zhang, G., Zheng, W., Shu, J.: ALV: a new data redistribution approach to RAID-5 scaling. IEEE Trans. Comput. 59(3), 345–357 (2009)
Article MathSciNet Google Scholar
Zheng, W., Zhang, G.: Fastscale: Accelerate raid scaling by minimizing data migration. In: USENIX Conference on File and Storage Technologies, San Jose, CA, 15–17 Feb 2011, pp. 149–161. USENIX
Zhang, G., Wang, J., Li, K., Shu, J., Zheng, W.: Redistribute data to regain load balance during raid-4 scaling. IEEE Trans. Parallel Distrib. Syst. 26(1), 219–229 (2014)
Article Google Scholar
Wu, C., He, X.: GSR: A global stripe-based redistribution approach to accelerate RAID-5 scaling. In: the International Conference on Parallel Processing, Pittsburgh, PA, 10–13 Sept 2012, pp. 460–469. IEEE Computer Society
Zhang, G., Zheng, W., Li, K.: Rethinking raid-5 data layout for better scalability. IEEE Trans. Comput. 63(11), 2816–2828 (2013)
Article MathSciNet Google Scholar
Mao, Y., Wan, J., Zhu, Y., Xie, C.: A new parity-based migration method to expand raid-5. IEEE Trans. Parallel Distrib. Syst. 25(8), 1945–1954 (2013)
Article Google Scholar
Liang, J., Xu, Y., Li, Y., Pan, Y.: ISM-an intra-stripe data migration approach for RAID-5 scaling. In: International Conference on Networking, Architecture, and Storage (NAS), Shenzhen, 7–9 Aug 2017, pp. 1–10. IEEE Computer Society
Gonzalez, J.L., Cortes, T.: Increasing the capacity of RAID5 by online gradual assimilation. In: The International Workshop on Storage Network Architecture and Parallel I/O, New York, 30 Sept 2004, pp. 17–24. ACM
Goel, A., Shahabi, C., Yao, S.-Y.D., Zimmermann, R.: SCADDAR: An efficient randomized technique to reorganize continuous media blocks. In: the 18th International Conference on Data Engineering, San Jose, CA, 26 February–1 March 2002, pp. 473–482. IEEE Computer Society
Wu, C., He, X., Han, J., Tan, H., Xie, C.: SDM: A stripe-based data migration scheme to improve the scalability of RAID-6. In: IEEE International Conference on Cluster Computing, Beijing, 24–28 Sept 2012, pp. 284–292. IEEE Computer Society
Zhang, G., Wu, G., Lu, Y., Wu, J., Zheng, W.: Xscale: online X-code RAID-6 scaling using lightweight data reorganization. IEEE Trans. Parallel Distrib. Syst. 27(12), 3687–3700 (2016)
Article Google Scholar
Xia, S., Mao, Y., Tan, M., Jing, W.: HCS: Expanding H-code RAID 6 without recalculating parity blocks in big data circumstance. In: International Conference of Young Computer Scientists, Engineers and Educators, Harbin, 10–12 Jan 2015, pp. 65–72. Springer
Yuan, Z., You, X., Lv, X., Li, M., Xie, P. (2021) HS6: an efficient H-code RAID-6 scaling by optimizing data migrating and parity updating. J. Supercomput. 1–31
Jin, P., Xie, P., Yuan, Z., Hu, Y., Gao, Y., Ma, J.: An approach for RAID-6 scaling based on D-code. In: International Conference on Computer and Communications (ICCC), Chengdu, 6–9 Dec 2019, pp. 545–549. IEEE
Zhong, X., Yuan, Z., Hu, Y., Xie, P.: An approach for RAID scaling based on STAR-code. In: International Conference on Computer and Communication Engineering Technology (CCET), Beijing, 16–18 Aug 2019, pp. 105–108. IEEE
Hu, Y., Xie, P., Gao, Y., Geng, S.: A scheme for RAID-6 scaling based on HoVer. In: International Conference on High Performance Compilation, Computing and Communications, Guangdong, 27–29 June 2020, pp. 168–172. ACM

Download references

Acknowledgements

A part of this work was presented at the 2019 International Conference on Communication and Information Processing (ICCIP 2019) and we have made substantial changes in this manuscript. This work was supported by the National Natural Science Foundation of China under Grants No.61762075, No.61671070, No.61972364 and No.61862055. It is also supported by the Provincial Natural Science Foundation of Qinghai under Grant No.2020-ZJ-926. The authors also acknowledge the Natural Science Foundation of Beijing under Grants No. 4212020, Defense-related Science and Technology Key Lab Fund project under Grants No. 6412006200404, Qin Xin Talents Cultivation Program of Beijing Information Science and Technology University under Grant No.QXTCP B201908 and Research Planning of Beijing Municipal Commission of Education under grant No.KM202111232001.

Funding

This work was supported by the National Natural Science Foundation of China under grants No.61762075, No.61671070, No.61972364 and No.61862055. It is also supported by the Provincial Natural Science Foundation of Qinghai under grant No.2020-ZJ-926. The authors also acknowledge the Natural Science Foundation of Beijing under Grants No. 4212020, Defense-related Science and Technology Key Lab Fund project under Grants No. 6412006200404, Qin Xin Talents Cultivation Program of Beijing Information Science and Technology University under grant No.QXTCP B201908 and Research Planning of Beijing Municipal Commission of Education under grant No.KM202111232001.

Author information

Authors and Affiliations

Computer College, Qinghai Normal University, Xining, China
Zhu Yuan, Muyuan Li & Ping Xie
The State Key Laboratory of Tibetan Intelligent Information Processing and Application, Xining, China
Zhu Yuan, Xueqiang Lv, Muyuan Li & Ping Xie
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing, China
Zhu Yuan, Xindong You & Xueqiang Lv
Key Laboratory of Internet of Things of Qinghai Province, Xining, China
Zhu Yuan, Muyuan Li & Ping Xie
Academy of Plateau Science and Sustainability, Xining, China
Zhu Yuan, Muyuan Li & Ping Xie

Authors

Zhu Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xindong You
View author publications
You can also search for this author in PubMed Google Scholar
Xueqiang Lv
View author publications
You can also search for this author in PubMed Google Scholar
Muyuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Ping Xie
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by [Zhu Yuan] and [Muyuan Li]. [Xindong You] and [Xue qiang Lv] guide the whole process of the experiment. The project comes from [Ping Xie]. [Ping Xie] participates and guides the whole work as the corresponding author. The first draft of the manuscript was written by [Zhu Yuan] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ping Xie.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with animals performed by any of the authors. Informed consent was obtained from all individual participants included in the study.

Informed consent

For all the above contents and statements, all authors in this manuscript have informed consent.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, Z., You, X., Lv, X. et al. HDS: optimizing data migration and parity update to realize RAID-6 scaling for HDP. Cluster Comput 24, 3815–3835 (2021). https://doi.org/10.1007/s10586-021-03379-0

Download citation

Received: 15 February 2021
Revised: 28 June 2021
Accepted: 30 July 2021
Published: 08 August 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10586-021-03379-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HDS: optimizing data migration and parity update to realize RAID-6 scaling for HDP

Abstract

Access this article

Similar content being viewed by others

HS6: An Efficient H-Code RAID-6 Scaling by Optimizing Data Migrating and Parity Updating

Nscale: an efficient RAID-6 online scaling via optimizing data migration

An efficient data layout scheme for better I/O balancing in RAID-6 storage systems

Data availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HDS: optimizing data migration and parity update to realize RAID-6 scaling for HDP

Abstract

Access this article

Similar content being viewed by others

HS6: An Efficient H-Code RAID-6 Scaling by Optimizing Data Migrating and Parity Updating

Nscale: an efficient RAID-6 online scaling via optimizing data migration

An efficient data layout scheme for better I/O balancing in RAID-6 storage systems

Data availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation