Abstract
Erasure coding technology is one of the key technologies in big data storage system. A well designed erasure coding can not only improve the reliability of the big data storage system, but also greatly improve the performance. Most of the existing big data storage systems use replica strategy, which can provide good availability and real-time, but it has caused a lot of data redundancy and waste of storage space. A large part of the data stored in the storage system exists in the form of cold data. In this paper, we aim at the cold data which doesn’t require highly on data availability and real-time in the big data storage system. We have proposed a scheme to support both replica strategy and coding strategy, and designed the node scheduling and data addressing scheme. We selected Liberation code which is excellent in writing operation, and developed P-Schedule scheme to optimize the decoding speed. Through a series of designs, we can effectively improve the disk utilization and write speed of the cold data in the big data system. The test results show that the sequential write performance of erasure coding is better than that of the replica strategy. The larger the data block is, the better the performance is.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Morris, R.J.T., Truskowski, B.J.: The evolution of storage systems. IBM Syst. J. 42(2), 205–217 (2003)
Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1–21 (2015)
Schermann, M., Hemsen, H., Buchmüller, C., Bitter, T., Krcmar, H., Markl, V., Hoeren, T.: Big data. Bus. Inf. Syst. Eng. 6(5), 261–266 (2014)
Chen, Y., Chen, H., Gorkhali, A., Lu, Y., Ma, Y., Li, L.: Big data analytics and big data science: a survey. J. Manag. Anal. 3(1), 1–42 (2016)
Li, S., Cao, Q., Wan, S., Qian, L., Xie, C.: HRSPC: a hybrid redundancy scheme via exploring computational locality to support fast recovery and high reliability in distributed storage systems. J. Netw. Comput. Appl. (2015)
Calder, B., Wang, J., Ogus, A., et al.: Windows Azure storage: a highly available cloud storage service with strong consistency. In: Proceeding of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 143–157 (2011)
Chun, B.G., Dabek, F., Haeberlen, A., et al.: Efficient replica maintenance for distributed storage systems. In: Proceedings of NSDI, pp. 225–264 (2006)
Chen, P.M., Lee, E.K., Gibson, G.A., et al.: RAID: high-performance, reliable secondary storage. ACM Comput. Surv.–CSUR 26(2), 145–185 (1994)
Corbett, P., English, B., Goel, A., et al.: Row-diagonal parity for double disk failure correction. In: FAST 2004: Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pp. 1–14 (2004)
Xiang, L., Xu, Y., Lui, J., et al.: Optimal recovery of single disk failure in RDP code storage systems. In: SIGMETRICS 2010 Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 119–130 (2010)
Blaum, M., Brady, J., Bruck, J., et al.: EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)
Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57(7), 889–901 (2008)
Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1996)
Rodrigues, R., Liskov, B.: High availability in DHTs: erasure coding vs. replication. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 226–239. Springer, Heidelberg (2005). https://doi.org/10.1007/11558989_21
Luo, J., Bowers, K.D., Oprea, A., Xu, L.: Efficient software implementations of large finite fields GF(2n) for secure storage applications. ACM Trans. Storage 8(2) (2012)
Acknowledgements
This work was supported by National Natural Science Foundation of China (No. 61662038), Science and technology project of Jiangxi Provincial Department of Education (No. GJJ151081), the Visiting Scholar Funds by China Scholarship Council, the JiangXi Association for Science and Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Yin, C., Lv, H., Li, T., Liu, Y., Qu, X., Yuan, S. (2018). P-Schedule: Erasure Coding Schedule Strategy in Big Data Storage System. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11336. Springer, Cham. https://doi.org/10.1007/978-3-030-05057-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-05057-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05056-6
Online ISBN: 978-3-030-05057-3
eBook Packages: Computer ScienceComputer Science (R0)