Abstract
The need for highly scalable and reliable big data storage systems is due to the fact that there is an explosive growth in data everywhere particularly due to data generated from social networking sites and IOT Technology for various applications. Hence most of Information Technology, Medical organizations and big Industries, Social networking companies, Government organizations (like ISRO) are required to have storage capacities of 100 PB (petabytes) of data. Therefore to secure and store this kind of large data efficiently research on application of Erasure codes for both Cloud storage and Network or Distributed Storage systems is considered recently. The traditional triple-replication method, which stores 3 copies of every file and requires an extra 200% storage overhead is actually highly expensive. Erasure Codes are considered for parallel storage systems as an alternative to traditional storage system. This paper presents the various techniques of applying LDPC codes for big data storage and provides the research gap to investigate application of LDPC codes for big data storage.












Similar content being viewed by others
References
Hadi, H. J., Shnain, A. H., Hadishaheed, S., & Ahmad, A. H. (2015). Big data and five V’s characteristics. International Journal of Advances in Electronics and Computer Science, 2(1), 16–23.
Beaver, D., Kumar, S., Li, H. C., Sobel, J., Vajgel, P., et al. (2010). Finding a needle in haystack: Facebook’s photo storage. OSDI, 10, 1–8.
Andriyanova, I. (2016). Coding and allocation for distributed data storage: Fundamental tradeoffs, coding schemes and allocation patterns.In Tutorial at the Swedish communications technologies workshop 2013, Gothenburg, Sweden. 2013 and HAL Archives, 27 January 2016.
Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The Google file system. Proceedings of ACM SIGOPS Operating Systems Review, 37(5), 29–43.
Borthakur, D. (2007). The hadoop distributed file system: Architecture and design. Hadoop Project Website, 11, 21.
Vijay Kumar, P. (2017). Codes for big data: Erasure coding for distributed storage. In 3rd annual storage developer conference, Indian Institute of Science, Bengaluru, May 25–26, 2017.
Cooley, J. A., Mineweaser, J. L., Servi, L. D., & Tsung, E. T. (2003). Software-based erasure codes for scalable distributed storage. In 20th IEEE/11th NASA goddard conference on mass storage systems and technologies, 2003. (MSST 2003). Proceedings, San Diego, CA, USA, 2003 (pp. 157–164).
Plank, J., & Thomason, M. G. (2004). A practical analysis of low-density parity-check erasure codes for wide-area storage applications. In Proceedings of the International conference on dependable systems and networks, 2004 (pp. 115–124).
Plank, J. S. (2013). Erasure codes for storage applications. In FAST-2005: 4th usenix conference on file and storage technologies San Francisco, CA, December, 2005, Slides presented at FAST-2013: 11th usenix conference on file and storage.
Plank, J. S., Buchsbaum, A. L., Collins, R. L., & Thomason, M. G. (2005). Small parity-check erasure codes: Exploration and observations. In 2005 international conference on dependable systems and networks (DSN’05), Yokohama, Japan, 2005 (pp. 326–335).
Feng, J., Chen, Y., & Summerville, D. (2012). A survey on the application of the XOR erasure code for distributed storage system. International Journal of Information, Intelligence and Knowledge, 4(2), 1–52.
Li, J., & Li, B. (2013). Erasure coding for cloud storage systems: A survey. Tsinghua Science and Technology, 18(3), 259–272.
Yongmei, W., Fengmin, C., & Cher, L. K. (2015).Large LDPC codes for big data storage. In ASE BD&SI ‘15: Proceedings of the ASE big data & social informatics 2015, October 2015 (Article No. 1, pp. 1–6).
Nachiappan, R., Javadi, B., Calherios, R., & Matawie, K. (2017). Cloud storage reliability for big data applications: A state of the art survey. Journal of Network and Computer Applications, 97, 35–47.
Balaji, S. B., Krishnan, N. M., Vajha, M., Ramkumar, V., Sasidharan, B., & Kumar, V. P. (2018). Erasure coding for distributed storage: An overview. Science China-Information Sciences, 61(10), 1–45.
Barbi, R. (2019). Erasure coding for distributed storage systems. Thesis, Universite De Neuchatel.
Plank, J. S. (2013). Erasure codes for storage systems. A brief primer. Usenix Magazine, 38(6), 44–51.
Yuan, D., Peng, X., Liu, T., & Cui, Z. (2017). A randomly expandable method for data layout of RAID storage systems. International Journal of Innovative Computing, Information and Control, 14, 1–13.
Singleton, R. C. (1964). Maximum distance q-nary codes. IEEE Transactions on Information Theory, 10(2), 116–118.
Harihara, S. G, Janakiram, B., Girish Chandra, M., Aravind, K. G., Kadhe, S., Balamuralidhar, P., & Adiga, B. S. (2010). SpreadStore: A LDPC erasure code scheme for distributed storage system. In 2010 international conference on data storage and data engineering, Bangalore, 2010 (pp. 154–158).
Schnjakin, M., Metzke, T., & Meinel, C. (2013). Applying erasure codes for fault tolerance in cloud-RAID. In 2013 IEEE 16th international conference on computational science and engineering, Sydney, NSW, 2013 (pp. 66–75).
Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S., et al., 2012. Erasure coding in windows azure storage, In Usenix annual technical conference. Boston, MA (pp. 15–26).
Ivanichkina, L., & Neporada, A. (2014). Mathematical methods and models of improving data storage reliability including those based on finite field theory. Contemporary Engineering Sciences, 7(28), 1589–1602.
Pawar, K. P., & Jogdand, R. M. (2016). A survey on erasure coding techniques for cloud storage system. International Journal of Computer Science and Information Technologies, 7(4), 1986–1997.
Rashmi, K. V., Shah, N. B., & Kumar, P. V. (2011). Enabling node repair in any erasure code for distributed storage. In 2011 IEEE international symposium on information theory proceedings (pp. 1235–1239).
Pitkänen, M., Moussa, R., Swany, M., & Niemi, T. (2006). Erasure codes for increasing the availability of grid data storage. In Proceedings of the advanced international conference on telecommunications & international conference on internet and web applications and services (AICT/ICIW 2006). Guadeloupe. 1925, February 2006 (pp. 185–194).
Subedi, P. (2016). Exploration of erasure-coded storage systems for high performance, reliability, and inter-operability. Theses and Dissertations, VCU Scholars Compass, Virginia Commonwealth University. Retrieved August, 2016, https://scholarscompass.vcu.edu/etd.
Silberstein, M., Ganesh, L., Wang, Y., Alvisi, L., & Dahlin M. (2014). Lazy means smart: Reducing repair bandwidth costs in erasure-coded distributed storage. In Proceedings of international conference on systems and storage (SYSTOR14), 2014 (pp. 1–7).
Reed, S., & Solomon, G. (1960). Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics, 8, 300–304.
Gopalan, P., Huang, C., Simitci, H., & Yekhanin, S. (2012). On the locality of codeword symbols. IEEE Transactions on Information Theory, 58, 6925–6934.
Gallager, R. (1963). Low density parity check codes, number 21 in research monograph series. Cambridge, MA: MIT Press.
MacKay, D. J., & Neal, R. M. (1996). Near Shannon limit performance of low density parity check codes. Electronics Letters, 32(18), 1645.
Byers, J. W., Luby, M., Mitzenmacher, M., & Rege, A. (1998). A digital fountain approach to reliable distribution of bulk data. In SIGCOMM’98. ACM (pp. 56–67).
Gupta, S. H., & Virmani, B. (2009). LDPC for Wi-Fi and WiMAX technologies. In 2009 international conference on emerging trends in electronic and photonic devices & systems, Varanasi (pp. 262–265).
Wang, Y.-L., Ueng, Y.-L., Peng, C.-L., & Yang, C.-J. (2011). Processing-task arrangement for a low- complexity full-mode WiMAX LDPC Codec. IEEE Transactions on Circuits and Systems I: Regular Papers, 58(2), 415–428.
Cohen, A. E., & Parhi, K. K. (2009). A low-complexity hybrid LDPC code encoder for IEEE 802.3an (10GBase-T) ethernet. IEEE Transactions on Signal Processing, 57(10), 4085–4094.
Falcao, G., Andrade, J., Silva, V., & Sousa, L. (2011). Real-time DVB-S2 LDPC decoding on many-core GPU accelerators. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1685–1688).
Kim, S. M., Park, C. S., & Hwang, S. Y. (2010). A novel partially parallel architecture for high-throughput LDPC decoder for DVB-S2. IEEE Transactions on Consumer Electronics, 56(2), 820–825.
Rhea, S., Eaton, P., Geels, D., Weatherspoon, H., Zhao, B., & Kubiatowicz, J. Pond: The ocean store prototype. In 2nd USENIX FAST (pp. 1–14).
Xia, H., & Chien, A. A. (2007). RobuSTore: A distributed storage architecture with robust and high performance. In SC ‘07: Proceedings of the 2007 ACM/IEEE conference on supercomputing, Reno, NV, USA, 2007 (pp. 1–11).
Park, H., Lee, D., & Moon, J. (2018). LDPC code design for distributed storage: Balancing repair bandwidth, reliability and storage overhead. IEEE Transactions on Communications, 66(2), 507–520.
James, S. P., & Michael, G. T. (2003). On the practical use of LDPC erasure codes for distributed storage applications. Technical report CS-03-510, University of Tennessee, September, 2003.
Wei, Y., Foo, Y. W., Lim, K. C., & Chen, F. (2014). The auto-configurable LDPC codes for distributed storage. In 2014 IEEE 17th international conference on computational science and engineering (pp. 1332–1338).
Gaidioz, B., Koblitz, B., & Santos, N. (2007). Exploring high performance distributed file storage using LDPC codes. Journal of Parallel Computing, 33(4–5), 264–274.
Wei, Y., Foo, Y. W., Lim, K. C., & Chen, F. (2014). The auto-configurable LDPC codes for distributed storage. In 2014 IEEE 17th international conference on computational science and engineering, Chengdu, 2014 (pp. 1332–1338).
Wei, Y., & Foo, Y. W. (2014). A cost-effective and reliable cloud storage. In 2014 IEEE 7th international conference on cloud computing, Anchorage, AK, 2014 (pp. 938–939).
Yongmei, W., & Fengmin, C. (2017). Guided systematic random LDPC for distributed storage system. In Association for computing machinery ACM, ICIT December 27–29, Singapore, 2017.
Deepitha, K. R., & Giri, A. (2013). Ensuring correctness and error localisation in cloud. International Journal of Computer Science and Information Technologies, 4(4), 625.
Wei, Y., Chen, F., & sheng, D. C. J. (2018). expanStor: Multiple cloud storage with dynamic data distribution. In 2017 IEEE 7th international symposium on cloud and service computing (SC2), Japan (pp. 86–90).
Biglieri, E., Chlamtac, Imrich, & Gallager, Robert G. (2005). Codes on graphs. Coding for wireless channels. Information technology: Transmission, processing and storage. Boston, MA: Springer.
Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519.
MacKay, D. J. C. (1999). Good error-correcting codes based on very sparse matrices. IEEE Transactions on Information Theory, 45(2), 399–431.
Colossus, Successor to Google File System. Retrieved May, 2016, http://static.googleusercontent.com/media/research.google.com/en/us/university/relations/facultysummit2010/storage_architecture_and_challenges.
Vincent Roca,Coding for loss tolerant systems, Mathieu Cunche, 2009.
Plank, J. S. (2004). All about erasure codes: Reed-Solomon coding, LDPC coding. Knoxville: University of Tennessee ICL.
Greenan, K. M., Li, X., & Wylie, J. J. (2010). Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), NV, 2010 (pp. 1–14).
Wei, Y., & Chen, F. (2016). expanCodes: Tailored LDPC codes for big data storage. In 2016 IEEE 14th international conference on dependable, autonomic and secure computing, 14th international conference on pervasive intelligence and computing, 2nd international conference on big data intelligence and computing and cyber science and technology congress 2016 IEEE.
Kotla, R., Alvisi, L., & Dahlin, M. (2007). SafeStore: A durable and practical storage system. In Proceedings of the 2007 USENIX annual technical conference, June 17–22, 2007, Santa Clara, CA, USA (pp. 129–142).
Dong, W., & Liu, G. (2018). An efficient parallel coding scheme in erasure-coded storage systems. IEICE Transactions on Information and Systems, 101(3), 627–642.
Puttarak, N. (2011). Coding for storage: disk arrays, flash memory, and distributed storage networks. Theses and Dissertations, Paper 1144, 2011.
Gopika-Rani, N., Sudha-Sadasivam, G., & Suresh, R. M. (2015). Comprative analysis of turbo and LDPC codes for reduced storage and retrieval of data. WSEAS Transactions on Computers, 14, 142–151.
Ando, D., Teraoko, F., & Kaneko, K. (2017). Content espresso: A distributed large file sharing system for digital content productions. IECE Transactions on Information & Systems, 100(9), 2100–2117.
Kim, D., Kim, H.-Y., Kim, Y.-K., & Kim, J.-J. (2018). Cost analysis of erasure coding for exa-scale storage. The Journal of Supercomputing, 75, 4638–4656.
Zhang, Z., & Lian, Q. (2003). Reperasure: Replication protocol using erasure-code in peer-to-peer storage network. In 21st IEEE symposium on reliable distributed systems, Japan, February 2003.
Pham, C., Zhang, F., & Tran, D. A. (2011). Maintenance-efficient erasure coding for distributed archival storage. In 2011 proceedings of 20th international conference on computer communications and networks (ICCCN),August 2011, HI, USA.
Sathiamoorthy, M., Asteris, M., Papailiopoulos, D., Dimakis, A. G., Vadali, R., Chen, S., et al. (2013). XORing elephants: Novel erasure codes for big data. Proceedings of the VLDB Endowment, 6(5), 325–336.
Delco, M., Weatherspoon, H., & Zhuang, S. (1999). Typhoon: An archival system for tolerating high degrees of file server failure. Berkeley: Computer Science Division – EECS, University of California.
Xie, N., Dong, G., & Zhang, T. (2011). Using lossless data compression in data storage systems: Not for saving space. IEEE Transactions on Computers, 60(3), 335–345.
Ma, Y., Nandagopal, T., Puttaswamy, K. P. N., & Banerjee, S. (2013). An ensemble of replication and erasure codes for cloud file systems. In 2013 proceedings IEEE INFOCOM, Turin, 2013 (pp. 1276–1284). https://doi.org/10.1109/infcom.2013.6566920.
Park, G. S., & Song, H. (2016). A novel hybrid P2P and cloud storage system for retrievability and privacy enhancement. Peer-to-Peer Networking and Applications, 9, 299–312.
Wei, Y. (2014). Auto-configurable, reliable, and fa, peer-to-peer netw. appl. ult-tolerant cloud storage with dynamic parameterization. In 2014 IEEE world congress on services, AK, 2014 (pp. 295–300).
Teng, Pengguo, Wang, Xiaojing, Chen, Liang, & Yuan, Dezhai. (2017). Binary random systematic erasure code for RAID system. AIP Conference Proceedings, 1820, 13.
Fu, Y., & Sun, B. (2012). A scheme of data confidentiality and fault-tolerance in cloud storage. In 2012 IEEE 2nd international conference on cloud computing and intelligence systems, Hangzhou, 2012 (pp. 228–233).
Tilo Strutz, Tutorial on LDPC Codes, June 9, 2016.
Pregara, M., Zachary, S., Ahn, S., & Lu, Y. (2013). Low Density Parity Check Code Implementation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bhuvaneshwari, P.V., Tharini, C. Review on LDPC Codes for Big Data Storage. Wireless Pers Commun 117, 1601–1625 (2021). https://doi.org/10.1007/s11277-020-07937-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-020-07937-4