Skip to main content

P-Schedule: Erasure Coding Schedule Strategy in Big Data Storage System

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11336))

Abstract

Erasure coding technology is one of the key technologies in big data storage system. A well designed erasure coding can not only improve the reliability of the big data storage system, but also greatly improve the performance. Most of the existing big data storage systems use replica strategy, which can provide good availability and real-time, but it has caused a lot of data redundancy and waste of storage space. A large part of the data stored in the storage system exists in the form of cold data. In this paper, we aim at the cold data which doesn’t require highly on data availability and real-time in the big data storage system. We have proposed a scheme to support both replica strategy and coding strategy, and designed the node scheduling and data addressing scheme. We selected Liberation code which is excellent in writing operation, and developed P-Schedule scheme to optimize the decoding speed. Through a series of designs, we can effectively improve the disk utilization and write speed of the cold data in the big data system. The test results show that the sequential write performance of erasure coding is better than that of the replica strategy. The larger the data block is, the better the performance is.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Morris, R.J.T., Truskowski, B.J.: The evolution of storage systems. IBM Syst. J. 42(2), 205–217 (2003)

    Article  Google Scholar 

  2. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1–21 (2015)

    Article  Google Scholar 

  3. Schermann, M., Hemsen, H., Buchmüller, C., Bitter, T., Krcmar, H., Markl, V., Hoeren, T.: Big data. Bus. Inf. Syst. Eng. 6(5), 261–266 (2014)

    Article  Google Scholar 

  4. Chen, Y., Chen, H., Gorkhali, A., Lu, Y., Ma, Y., Li, L.: Big data analytics and big data science: a survey. J. Manag. Anal. 3(1), 1–42 (2016)

    Google Scholar 

  5. Li, S., Cao, Q., Wan, S., Qian, L., Xie, C.: HRSPC: a hybrid redundancy scheme via exploring computational locality to support fast recovery and high reliability in distributed storage systems. J. Netw. Comput. Appl. (2015)

    Google Scholar 

  6. Calder, B., Wang, J., Ogus, A., et al.: Windows Azure storage: a highly available cloud storage service with strong consistency. In: Proceeding of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 143–157 (2011)

    Google Scholar 

  7. Chun, B.G., Dabek, F., Haeberlen, A., et al.: Efficient replica maintenance for distributed storage systems. In: Proceedings of NSDI, pp. 225–264 (2006)

    Google Scholar 

  8. Chen, P.M., Lee, E.K., Gibson, G.A., et al.: RAID: high-performance, reliable secondary storage. ACM Comput. Surv.–CSUR 26(2), 145–185 (1994)

    Article  Google Scholar 

  9. Corbett, P., English, B., Goel, A., et al.: Row-diagonal parity for double disk failure correction. In: FAST 2004: Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pp. 1–14 (2004)

    Google Scholar 

  10. Xiang, L., Xu, Y., Lui, J., et al.: Optimal recovery of single disk failure in RDP code storage systems. In: SIGMETRICS 2010 Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 119–130 (2010)

    Google Scholar 

  11. Blaum, M., Brady, J., Bruck, J., et al.: EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)

    Article  Google Scholar 

  12. Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57(7), 889–901 (2008)

    Article  MathSciNet  Google Scholar 

  13. Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1996)

    Article  MathSciNet  Google Scholar 

  14. Rodrigues, R., Liskov, B.: High availability in DHTs: erasure coding vs. replication. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 226–239. Springer, Heidelberg (2005). https://doi.org/10.1007/11558989_21

    Chapter  Google Scholar 

  15. Luo, J., Bowers, K.D., Oprea, A., Xu, L.: Efficient software implementations of large finite fields GF(2n) for secure storage applications. ACM Trans. Storage 8(2) (2012)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61662038), Science and technology project of Jiangxi Provincial Department of Education (No. GJJ151081), the Visiting Scholar Funds by China Scholarship Council, the JiangXi Association for Science and Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haitao Lv .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yin, C., Lv, H., Li, T., Liu, Y., Qu, X., Yuan, S. (2018). P-Schedule: Erasure Coding Schedule Strategy in Big Data Storage System. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11336. Springer, Cham. https://doi.org/10.1007/978-3-030-05057-3_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05057-3_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05056-6

  • Online ISBN: 978-3-030-05057-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics