P-Schedule: Erasure Coding Schedule Strategy in Big Data Storage System

Yin, Chao; Lv, Haitao; Li, Tongfang; Liu, Yan; Qu, Xiaoping; Yuan, Sihao

doi:10.1007/978-3-030-05057-3_22

Chao Yin¹⁵,
Haitao Lv¹⁵,
Tongfang Li¹⁵,
Yan Liu¹⁵,
Xiaoping Qu¹⁵ &
…
Sihao Yuan¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11336))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1659 Accesses
1 Citations

Abstract

Erasure coding technology is one of the key technologies in big data storage system. A well designed erasure coding can not only improve the reliability of the big data storage system, but also greatly improve the performance. Most of the existing big data storage systems use replica strategy, which can provide good availability and real-time, but it has caused a lot of data redundancy and waste of storage space. A large part of the data stored in the storage system exists in the form of cold data. In this paper, we aim at the cold data which doesn’t require highly on data availability and real-time in the big data storage system. We have proposed a scheme to support both replica strategy and coding strategy, and designed the node scheduling and data addressing scheme. We selected Liberation code which is excellent in writing operation, and developed P-Schedule scheme to optimize the decoding speed. Through a series of designs, we can effectively improve the disk utilization and write speed of the cold data in the big data system. The test results show that the sequential write performance of erasure coding is better than that of the replica strategy. The larger the data block is, the better the performance is.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data Delta Based Hybrid Writes for Erasure-Coded Storage Systems

Erasure-Coded Hybrid Writes Based on Data Delta

Article 24 May 2024

A New Decoding Algorithm for XOR-Based Erasure Codes

Article 20 April 2020

References

Morris, R.J.T., Truskowski, B.J.: The evolution of storage systems. IBM Syst. J. 42(2), 205–217 (2003)
Article Google Scholar
Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1–21 (2015)
Article Google Scholar
Schermann, M., Hemsen, H., Buchmüller, C., Bitter, T., Krcmar, H., Markl, V., Hoeren, T.: Big data. Bus. Inf. Syst. Eng. 6(5), 261–266 (2014)
Article Google Scholar
Chen, Y., Chen, H., Gorkhali, A., Lu, Y., Ma, Y., Li, L.: Big data analytics and big data science: a survey. J. Manag. Anal. 3(1), 1–42 (2016)
Google Scholar
Li, S., Cao, Q., Wan, S., Qian, L., Xie, C.: HRSPC: a hybrid redundancy scheme via exploring computational locality to support fast recovery and high reliability in distributed storage systems. J. Netw. Comput. Appl. (2015)
Google Scholar
Calder, B., Wang, J., Ogus, A., et al.: Windows Azure storage: a highly available cloud storage service with strong consistency. In: Proceeding of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 143–157 (2011)
Google Scholar
Chun, B.G., Dabek, F., Haeberlen, A., et al.: Efficient replica maintenance for distributed storage systems. In: Proceedings of NSDI, pp. 225–264 (2006)
Google Scholar
Chen, P.M., Lee, E.K., Gibson, G.A., et al.: RAID: high-performance, reliable secondary storage. ACM Comput. Surv.–CSUR 26(2), 145–185 (1994)
Article Google Scholar
Corbett, P., English, B., Goel, A., et al.: Row-diagonal parity for double disk failure correction. In: FAST 2004: Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pp. 1–14 (2004)
Google Scholar
Xiang, L., Xu, Y., Lui, J., et al.: Optimal recovery of single disk failure in RDP code storage systems. In: SIGMETRICS 2010 Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 119–130 (2010)
Google Scholar
Blaum, M., Brady, J., Bruck, J., et al.: EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)
Article Google Scholar
Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57(7), 889–901 (2008)
Article MathSciNet Google Scholar
Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1996)
Article MathSciNet Google Scholar
Rodrigues, R., Liskov, B.: High availability in DHTs: erasure coding vs. replication. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 226–239. Springer, Heidelberg (2005). https://doi.org/10.1007/11558989_21
Chapter Google Scholar
Luo, J., Bowers, K.D., Oprea, A., Xu, L.: Efficient software implementations of large finite fields GF(2n) for secure storage applications. ACM Trans. Storage 8(2) (2012)
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61662038), Science and technology project of Jiangxi Provincial Department of Education (No. GJJ151081), the Visiting Scholar Funds by China Scholarship Council, the JiangXi Association for Science and Technology.

Author information

Authors and Affiliations

Jiujiang University, Jiujiang, 332005, China
Chao Yin, Haitao Lv, Tongfang Li, Yan Liu, Xiaoping Qu & Sihao Yuan

Authors

Chao Yin
View author publications
You can also search for this author in PubMed Google Scholar
Haitao Lv
View author publications
You can also search for this author in PubMed Google Scholar
Tongfang Li
View author publications
You can also search for this author in PubMed Google Scholar
Yan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoping Qu
View author publications
You can also search for this author in PubMed Google Scholar
Sihao Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haitao Lv .

Editor information

Editors and Affiliations

Rutgers University–Newark, Newark, NJ, USA
Jaideep Vaidya
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, C., Lv, H., Li, T., Liu, Y., Qu, X., Yuan, S. (2018). P-Schedule: Erasure Coding Schedule Strategy in Big Data Storage System. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11336. Springer, Cham. https://doi.org/10.1007/978-3-030-05057-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-05057-3_22
Published: 07 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05056-6
Online ISBN: 978-3-030-05057-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics