Abstract
It is well-known that erasure codes, such as Reed-Solomon (RS) and Cauchy RS (CRS) codes, have played an important roles in big data storage systems to both industry and academia. While RS and CRS codes provide significant saving in storage space, they can impose a huge burden of systems performance while encoding and decoding. By studying existing high reliability and space saving rate of coding technologies, it is urgent to deploy an efficient erasure coding mechanism into distributed storage systems, which is the main storage architecture in big data era.This paper puts forward an optimized algorithm named OptRS (Optimized RS), which can not only guarantee the system’s reliability, but also enhance the efficiency and utilization of storage space. The dominant type of encoding and decoding inside erasure codes is matrix computation. In order to accelerate the speed of calculation, OptRS transferred the computation of matrix Galois field mapping into the XOR operation. Additionally, OptRS has developed the elimination schemes to minimize the numbers of XOR. Through theory analysis, we can conclude that OptRS algorithm improved the performance of encoding and decoding lead to shorten the computation time the same as verified by the test. The encoding efficiency with OptRS coding achieves 36.1 % and 58.2 % acceleration than using CRS and RS coding, respectively. The decoding rate by using OptRS can increase 19.3 % and 33.1 % compared with CRS and RS averagely by quantitative studying.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Schmuh, F., Haskin, R.: GPFS: A shareddisk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies (2002), Monterey, CA, USA (2002)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP 2003, pp. 29–43 (2003)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Proceedings of IEEE MSST 2010, Incline Village, NV, USA, May 2010
Amazon Simple Storage Service (S3). http://www.amazon.com/s3
Weil, S.A., Brandt, S.A., Miller, E.L., et al.: Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Conference on Operating Systems Design and Implementation (2006)
Reed, I.S., Solmon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960)
Colossus, successor to Google File System. http://static.googleusercontent.com/media/research.google.com/en/us/university/relations/facultysummit2010/storage_architecture_and_challenges.pdf/
Huang, C., Simitci, H., Xu, Y. et al.: Erasure coding in Windows AzureStorage. In: USENIX Annual Technical Conference (ATC) (June 2012), boston, MA,USA (2012)
Facebooks approach to big data storage challenge. http://www.slideshare.net/Hadoop_Summit/facebooks-approachto-big-data-storage-challenge
Blomer, J., Kalfane, M., Karpinski, M., et al.: An XOR-based erasure-resilient coding scheme. Technical Report TR-95-048, International Computer Science Institute, August 1995
DeCandia, G., Hastorun, D., Jampani, M., et al.: Dynamo: amazon’s highly available key-value store. In: ACM SIGOPS Operating Systems Review, Vol. 41(6), pp. 205–220. ACM (2007)
An introduction to GPFS version 3.5. http://www-03.ibm.com/systems/resources/introduction-to-gpfs-3-5.pdf
Facebooks erasure coded hadoop distributed file system (HDFS-RAID). https://github.com/facebook/hadoop-20
Yin, C., Xie, C., Wan, J., et al.: BMCloud: Minimizing repair bandwidth and maintenance cost in cloud storage. In: Mathematical Problems in Engineering (2013)
Plank, J.S., Greenan, K.M., Miller, E.L.: Screaming fast Galois Field arithmetic using Intel SIMD instructions. In: Proceedings of the 11th USENIX Conference on File and Storage Technologies (2013), San Jose, CA, USA (2013)
Rashmi, K.V., Shan, N.B., Gu, D., et al.: A hitchhikers guide to fast and efficient data reconstruction in erasure-coded data centers. In: Proceedings of ACM SIGCOMM14, SIGCOMM (2014)
Yin, C., Wang, J., Xie, C., et al.: Robot: an efficient model for big data storage systems based on erasure coding. In: Proceedings of the IEEE International Conference on Big Data, Santa Clara, CA, USA (2013)
Khan, O., Burns, R., Plank, J., et al.: Rethinking eerasure codes for cloud file systems: minimizing I/O for recovery and degraded reads. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, San Jose, CA, USA (2012)
Xia, M., Saxena, M., Blaum, M., et al.: A tale of two erasure codes in HDFS. In: the Proceedings of the 13th USENIX Conference on File and Storage Technologies, Santa Clara, CA, USA (2015)
Tamo, I., Barg, A.: A family of optimal locally recoverable codes. IEEE Trans. Inf. Theor. 60(8), 4661–4676 (2014)
Rashmi, K.V., Nakkiran, P., Wang, J., et al.: Having your cake and eating it too: jointly optimal erasure codes for I/O, storage, and network-bandwidth. In: The Proceedings of the 13th USENIX Conference on File and Storage Technologies, Santa Clara, CA, USA (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yin, C. et al. (2015). OptRS: An Optimized Algorithm Based on CRS Codes in Big Data Storage Systems. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9528. Springer, Cham. https://doi.org/10.1007/978-3-319-27119-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-27119-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27118-7
Online ISBN: 978-3-319-27119-4
eBook Packages: Computer ScienceComputer Science (R0)