Skip to main content
Log in

Review on LDPC Codes for Big Data Storage

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

The need for highly scalable and reliable big data storage systems is due to the fact that there is an explosive growth in data everywhere particularly due to data generated from social networking sites and IOT Technology for various applications. Hence most of Information Technology, Medical organizations and big Industries, Social networking companies, Government organizations (like ISRO) are required to have storage capacities of 100 PB (petabytes) of data. Therefore to secure and store this kind of large data efficiently research on application of Erasure codes for both Cloud storage and Network or Distributed Storage systems is considered recently. The traditional triple-replication method, which stores 3 copies of every file and requires an extra 200% storage overhead is actually highly expensive. Erasure Codes are considered for parallel storage systems as an alternative to traditional storage system. This paper presents the various techniques of applying LDPC codes for big data storage and provides the research gap to investigate application of LDPC codes for big data storage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Hadi, H. J., Shnain, A. H., Hadishaheed, S., & Ahmad, A. H. (2015). Big data and five V’s characteristics. International Journal of Advances in Electronics and Computer Science, 2(1), 16–23.

    Google Scholar 

  2. Beaver, D., Kumar, S., Li, H. C., Sobel, J., Vajgel, P., et al. (2010). Finding a needle in haystack: Facebook’s photo storage. OSDI, 10, 1–8.

    Google Scholar 

  3. Andriyanova, I. (2016). Coding and allocation for distributed data storage: Fundamental tradeoffs, coding schemes and allocation patterns.In Tutorial at the Swedish communications technologies workshop 2013, Gothenburg, Sweden. 2013 and HAL Archives, 27 January 2016.

  4. Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The Google file system. Proceedings of ACM SIGOPS Operating Systems Review, 37(5), 29–43.

    Article  Google Scholar 

  5. Borthakur, D. (2007). The hadoop distributed file system: Architecture and design. Hadoop Project Website, 11, 21.

    Google Scholar 

  6. Vijay Kumar, P. (2017). Codes for big data: Erasure coding for distributed storage. In 3rd annual storage developer conference, Indian Institute of Science, Bengaluru, May 25–26, 2017.

  7. Cooley, J. A., Mineweaser, J. L., Servi, L. D., & Tsung, E. T. (2003). Software-based erasure codes for scalable distributed storage. In 20th IEEE/11th NASA goddard conference on mass storage systems and technologies, 2003. (MSST 2003). Proceedings, San Diego, CA, USA, 2003 (pp. 157–164).

  8. Plank, J., & Thomason, M. G. (2004). A practical analysis of low-density parity-check erasure codes for wide-area storage applications. In Proceedings of the International conference on dependable systems and networks, 2004 (pp. 115–124).

  9. Plank, J. S. (2013). Erasure codes for storage applications. In FAST-2005: 4th usenix conference on file and storage technologies San Francisco, CA, December, 2005, Slides presented at FAST-2013: 11th usenix conference on file and storage.

  10. Plank, J. S., Buchsbaum, A. L., Collins, R. L., & Thomason, M. G. (2005). Small parity-check erasure codes: Exploration and observations. In 2005 international conference on dependable systems and networks (DSN’05), Yokohama, Japan, 2005 (pp. 326–335).

  11. Feng, J., Chen, Y., & Summerville, D. (2012). A survey on the application of the XOR erasure code for distributed storage system. International Journal of Information, Intelligence and Knowledge, 4(2), 1–52.

    Google Scholar 

  12. Li, J., & Li, B. (2013). Erasure coding for cloud storage systems: A survey. Tsinghua Science and Technology, 18(3), 259–272.

    Article  Google Scholar 

  13. Yongmei, W., Fengmin, C., & Cher, L. K. (2015).Large LDPC codes for big data storage. In ASE BD&SI ‘15: Proceedings of the ASE big data & social informatics 2015, October 2015 (Article No. 1, pp. 1–6).

  14. Nachiappan, R., Javadi, B., Calherios, R., & Matawie, K. (2017). Cloud storage reliability for big data applications: A state of the art survey. Journal of Network and Computer Applications, 97, 35–47.

    Article  Google Scholar 

  15. Balaji, S. B., Krishnan, N. M., Vajha, M., Ramkumar, V., Sasidharan, B., & Kumar, V. P. (2018). Erasure coding for distributed storage: An overview. Science China-Information Sciences, 61(10), 1–45.

    Article  Google Scholar 

  16. Barbi, R. (2019). Erasure coding for distributed storage systems. Thesis, Universite De Neuchatel.

  17. Plank, J. S. (2013). Erasure codes for storage systems. A brief primer. Usenix Magazine, 38(6), 44–51.

    Google Scholar 

  18. Yuan, D., Peng, X., Liu, T., & Cui, Z. (2017). A randomly expandable method for data layout of RAID storage systems. International Journal of Innovative Computing, Information and Control, 14, 1–13.

    Google Scholar 

  19. Singleton, R. C. (1964). Maximum distance q-nary codes. IEEE Transactions on Information Theory, 10(2), 116–118.

    Article  MathSciNet  Google Scholar 

  20. Harihara, S. G, Janakiram, B., Girish Chandra, M., Aravind, K. G., Kadhe, S., Balamuralidhar, P., & Adiga, B. S. (2010). SpreadStore: A LDPC erasure code scheme for distributed storage system. In 2010 international conference on data storage and data engineering, Bangalore, 2010 (pp. 154–158).

  21. Schnjakin, M., Metzke, T., & Meinel, C. (2013). Applying erasure codes for fault tolerance in cloud-RAID. In 2013 IEEE 16th international conference on computational science and engineering, Sydney, NSW, 2013 (pp. 66–75).

  22. Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S., et al., 2012. Erasure coding in windows azure storage, In Usenix annual technical conference. Boston, MA (pp. 15–26).

  23. Ivanichkina, L., & Neporada, A. (2014). Mathematical methods and models of improving data storage reliability including those based on finite field theory. Contemporary Engineering Sciences, 7(28), 1589–1602.

    Article  Google Scholar 

  24. Pawar, K. P., & Jogdand, R. M. (2016). A survey on erasure coding techniques for cloud storage system. International Journal of Computer Science and Information Technologies, 7(4), 1986–1997.

    Google Scholar 

  25. Rashmi, K. V., Shah, N. B., & Kumar, P. V. (2011). Enabling node repair in any erasure code for distributed storage. In 2011 IEEE international symposium on information theory proceedings (pp. 1235–1239).

  26. Pitkänen, M., Moussa, R., Swany, M., & Niemi, T. (2006). Erasure codes for increasing the availability of grid data storage. In Proceedings of the advanced international conference on telecommunications & international conference on internet and web applications and services (AICT/ICIW 2006). Guadeloupe. 1925, February 2006 (pp. 185–194).

  27. Subedi, P. (2016). Exploration of erasure-coded storage systems for high performance, reliability, and inter-operability. Theses and Dissertations, VCU Scholars Compass, Virginia Commonwealth University. Retrieved August, 2016, https://scholarscompass.vcu.edu/etd.

  28. Silberstein, M., Ganesh, L., Wang, Y., Alvisi, L., & Dahlin M. (2014). Lazy means smart: Reducing repair bandwidth costs in erasure-coded distributed storage. In Proceedings of international conference on systems and storage (SYSTOR14), 2014 (pp. 1–7).

  29. Reed, S., & Solomon, G. (1960). Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics, 8, 300–304.

    Article  MathSciNet  Google Scholar 

  30. Gopalan, P., Huang, C., Simitci, H., & Yekhanin, S. (2012). On the locality of codeword symbols. IEEE Transactions on Information Theory, 58, 6925–6934.

    Article  MathSciNet  Google Scholar 

  31. Gallager, R. (1963). Low density parity check codes, number 21 in research monograph series. Cambridge, MA: MIT Press.

    Book  Google Scholar 

  32. MacKay, D. J., & Neal, R. M. (1996). Near Shannon limit performance of low density parity check codes. Electronics Letters, 32(18), 1645.

    Article  Google Scholar 

  33. Byers, J. W., Luby, M., Mitzenmacher, M., & Rege, A. (1998). A digital fountain approach to reliable distribution of bulk data. In SIGCOMM’98. ACM (pp. 56–67).

  34. Gupta, S. H., & Virmani, B. (2009). LDPC for Wi-Fi and WiMAX technologies. In 2009 international conference on emerging trends in electronic and photonic devices & systems, Varanasi (pp. 262–265).

  35. Wang, Y.-L., Ueng, Y.-L., Peng, C.-L., & Yang, C.-J. (2011). Processing-task arrangement for a low- complexity full-mode WiMAX LDPC Codec. IEEE Transactions on Circuits and Systems I: Regular Papers, 58(2), 415–428.

    Article  MathSciNet  Google Scholar 

  36. Cohen, A. E., & Parhi, K. K. (2009). A low-complexity hybrid LDPC code encoder for IEEE 802.3an (10GBase-T) ethernet. IEEE Transactions on Signal Processing, 57(10), 4085–4094.

    Article  MathSciNet  Google Scholar 

  37. Falcao, G., Andrade, J., Silva, V., & Sousa, L. (2011). Real-time DVB-S2 LDPC decoding on many-core GPU accelerators. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1685–1688).

  38. Kim, S. M., Park, C. S., & Hwang, S. Y. (2010). A novel partially parallel architecture for high-throughput LDPC decoder for DVB-S2. IEEE Transactions on Consumer Electronics, 56(2), 820–825.

    Article  Google Scholar 

  39. Rhea, S., Eaton, P., Geels, D., Weatherspoon, H., Zhao, B., & Kubiatowicz, J. Pond: The ocean store prototype. In 2nd USENIX FAST (pp. 1–14).

  40. Xia, H., & Chien, A. A. (2007). RobuSTore: A distributed storage architecture with robust and high performance. In SC ‘07: Proceedings of the 2007 ACM/IEEE conference on supercomputing, Reno, NV, USA, 2007 (pp. 1–11).

  41. Park, H., Lee, D., & Moon, J. (2018). LDPC code design for distributed storage: Balancing repair bandwidth, reliability and storage overhead. IEEE Transactions on Communications, 66(2), 507–520.

    Article  Google Scholar 

  42. James, S. P., & Michael, G. T. (2003). On the practical use of LDPC erasure codes for distributed storage applications. Technical report CS-03-510, University of Tennessee, September, 2003.

  43. Wei, Y., Foo, Y. W., Lim, K. C., & Chen, F. (2014). The auto-configurable LDPC codes for distributed storage. In 2014 IEEE 17th international conference on computational science and engineering (pp. 1332–1338).

  44. Gaidioz, B., Koblitz, B., & Santos, N. (2007). Exploring high performance distributed file storage using LDPC codes. Journal of Parallel Computing, 33(4–5), 264–274.

    Article  Google Scholar 

  45. Wei, Y., Foo, Y. W., Lim, K. C., & Chen, F. (2014). The auto-configurable LDPC codes for distributed storage. In 2014 IEEE 17th international conference on computational science and engineering, Chengdu, 2014 (pp. 1332–1338).

  46. Wei, Y., & Foo, Y. W. (2014). A cost-effective and reliable cloud storage. In 2014 IEEE 7th international conference on cloud computing, Anchorage, AK, 2014 (pp. 938–939).

  47. Yongmei, W., & Fengmin, C. (2017). Guided systematic random LDPC for distributed storage system. In Association for computing machinery ACM, ICIT December 27–29, Singapore, 2017.

  48. Deepitha, K. R., & Giri, A. (2013). Ensuring correctness and error localisation in cloud. International Journal of Computer Science and Information Technologies, 4(4), 625.

    Google Scholar 

  49. Wei, Y., Chen, F., & sheng, D. C. J. (2018). expanStor: Multiple cloud storage with dynamic data distribution. In 2017 IEEE 7th international symposium on cloud and service computing (SC2), Japan (pp. 86–90).

  50. Biglieri, E., Chlamtac, Imrich, & Gallager, Robert G. (2005). Codes on graphs. Coding for wireless channels. Information technology: Transmission, processing and storage. Boston, MA: Springer.

    Google Scholar 

  51. Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519.

    Article  MathSciNet  Google Scholar 

  52. MacKay, D. J. C. (1999). Good error-correcting codes based on very sparse matrices. IEEE Transactions on Information Theory, 45(2), 399–431.

    Article  MathSciNet  Google Scholar 

  53. Colossus, Successor to Google File System. Retrieved May, 2016, http://static.googleusercontent.com/media/research.google.com/en/us/university/relations/facultysummit2010/storage_architecture_and_challenges.

  54. Vincent Roca,Coding for loss tolerant systems, Mathieu Cunche, 2009.

  55. Plank, J. S. (2004). All about erasure codes: Reed-Solomon coding, LDPC coding. Knoxville: University of Tennessee ICL.

    Google Scholar 

  56. Greenan, K. M., Li, X., & Wylie, J. J. (2010). Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), NV, 2010 (pp. 1–14).

  57. Wei, Y., & Chen, F. (2016). expanCodes: Tailored LDPC codes for big data storage. In 2016 IEEE 14th international conference on dependable, autonomic and secure computing, 14th international conference on pervasive intelligence and computing, 2nd international conference on big data intelligence and computing and cyber science and technology congress 2016 IEEE.

  58. Kotla, R., Alvisi, L., & Dahlin, M. (2007). SafeStore: A durable and practical storage system. In Proceedings of the 2007 USENIX annual technical conference, June 17–22, 2007, Santa Clara, CA, USA (pp. 129–142).

  59. Dong, W., & Liu, G. (2018). An efficient parallel coding scheme in erasure-coded storage systems. IEICE Transactions on Information and Systems, 101(3), 627–642.

    Article  Google Scholar 

  60. Puttarak, N. (2011). Coding for storage: disk arrays, flash memory, and distributed storage networks. Theses and Dissertations, Paper 1144, 2011.

  61. Gopika-Rani, N., Sudha-Sadasivam, G., & Suresh, R. M. (2015). Comprative analysis of turbo and LDPC codes for reduced storage and retrieval of data. WSEAS Transactions on Computers, 14, 142–151.

    Google Scholar 

  62. Ando, D., Teraoko, F., & Kaneko, K. (2017). Content espresso: A distributed large file sharing system for digital content productions. IECE Transactions on Information & Systems, 100(9), 2100–2117.

    Article  Google Scholar 

  63. Kim, D., Kim, H.-Y., Kim, Y.-K., & Kim, J.-J. (2018). Cost analysis of erasure coding for exa-scale storage. The Journal of Supercomputing, 75, 4638–4656.

    Article  Google Scholar 

  64. Zhang, Z., & Lian, Q. (2003). Reperasure: Replication protocol using erasure-code in peer-to-peer storage network. In 21st IEEE symposium on reliable distributed systems, Japan, February 2003.

  65. Pham, C., Zhang, F., & Tran, D. A. (2011). Maintenance-efficient erasure coding for distributed archival storage. In 2011 proceedings of 20th international conference on computer communications and networks (ICCCN),August 2011, HI, USA.

  66. Sathiamoorthy, M., Asteris, M., Papailiopoulos, D., Dimakis, A. G., Vadali, R., Chen, S., et al. (2013). XORing elephants: Novel erasure codes for big data. Proceedings of the VLDB Endowment, 6(5), 325–336.

    Article  Google Scholar 

  67. Delco, M., Weatherspoon, H., & Zhuang, S. (1999). Typhoon: An archival system for tolerating high degrees of file server failure. Berkeley: Computer Science Division – EECS, University of California.

    Google Scholar 

  68. Xie, N., Dong, G., & Zhang, T. (2011). Using lossless data compression in data storage systems: Not for saving space. IEEE Transactions on Computers, 60(3), 335–345.

    Article  MathSciNet  Google Scholar 

  69. Ma, Y., Nandagopal, T., Puttaswamy, K. P. N., & Banerjee, S. (2013). An ensemble of replication and erasure codes for cloud file systems. In 2013 proceedings IEEE INFOCOM, Turin, 2013 (pp. 1276–1284). https://doi.org/10.1109/infcom.2013.6566920.

  70. Park, G. S., & Song, H. (2016). A novel hybrid P2P and cloud storage system for retrievability and privacy enhancement. Peer-to-Peer Networking and Applications, 9, 299–312.

    Article  Google Scholar 

  71. Wei, Y. (2014). Auto-configurable, reliable, and fa, peer-to-peer netw. appl. ult-tolerant cloud storage with dynamic parameterization. In 2014 IEEE world congress on services, AK, 2014 (pp. 295–300).

  72. Teng, Pengguo, Wang, Xiaojing, Chen, Liang, & Yuan, Dezhai. (2017). Binary random systematic erasure code for RAID system. AIP Conference Proceedings, 1820, 13.

    Google Scholar 

  73. Fu, Y., & Sun, B. (2012). A scheme of data confidentiality and fault-tolerance in cloud storage. In 2012 IEEE 2nd international conference on cloud computing and intelligence systems, Hangzhou, 2012 (pp. 228–233).

  74. Tilo Strutz, Tutorial on LDPC Codes, June 9, 2016.

  75. Pregara, M., Zachary, S., Ahn, S., & Lu, Y. (2013). Low Density Parity Check Code Implementation.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Tharini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhuvaneshwari, P.V., Tharini, C. Review on LDPC Codes for Big Data Storage. Wireless Pers Commun 117, 1601–1625 (2021). https://doi.org/10.1007/s11277-020-07937-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-020-07937-4

Keywords

Navigation