Skip to main content
Log in

Graphics processing unit-accelerated joint-bitplane belief propagation algorithm in DSC

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The M-ary source with nonstationary correlation can be encoded with a single binary low-density parity-check (LDPC) code and decoded together in distributed source coding. The joint-bitplane belief propagation (JBBP) is a useful decoding algorithm for multiple bitplanes of an M-ary source. However, it suffers from the drawbacks of low computational efficiency and long execution time. Motivated by the evolution of the Graphics Processing Unit (GPU) and the inherent parallel characteristic of the JBBP, we propose a novel approach for the computationally intensive processing of the JBBP algorithm on GPU using the compute unified device architecture programming model. Two different parallel modes are utilized for the belief passing between different nodes of the JBBP. It is found that the bottlenecks of the JBBP lie in computing the overall probability mass functions (pmfs) of symbol nodes and the overall beliefs of bit nodes. Thus, a data partitioning method is leveraged to split a large array of pmfs into small pieces which can be loaded into L1 cache instead of global memory. The optimal block size is selected which not only assigns as large L1 cache as possible for individual thread, but also guarantees multiple active warps in each stream multiprocessor. Experimental results show that when the length-6336 (length-50,688, resp.) LDPC accumulate (LDPCA) code is used to compress the source, the JBBP decoder can achieve about 20\(\times \) (41\(\times \), resp.) speedup on GPU compared with the original C code on CPU. Better performance would be further obtained with longer LDPCA codes. Moreover, the parallel JBBP is also applied in hyperspectral image compression and video coding and it shows good speedup performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Alon N, Luby M (1996) A linear time erasure-resilient code with nearly optimal recovery. IEEE transactions on information theory 42(6):1732–1736

    Article  MathSciNet  MATH  Google Scholar 

  2. Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J. Parallel Distrib. Comput. 10(2):188–192

    Article  Google Scholar 

  3. Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269

    Article  MATH  Google Scholar 

  4. Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. In: Computer graphics forum, vol 8, pp 3–11. Wiley Online Library, New York

  5. Arif M, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multiring network. J Supercomput 25(1):43–62

    Article  MATH  Google Scholar 

  6. Berrou C, Glavieux A (1996) Near optimum error correcting coding and decoding: turbo-codes. IEEE Trans Commun 44(10):1261–1271

    Article  Google Scholar 

  7. Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessorłtheoretical properties and algorithms. Parallel Comput 21(11):1783–1805

    Article  Google Scholar 

  8. Conde-Canencia L, Al-Ghouwayel A, Boutillon E et al (2009) Complexity comparison of non-binary LDPC decoders. ICT-MobileSummit 1–8

  9. Cui L, Wang S, Jiang X, Cheng S (2012) Adaptive distributed video coding with correlation estimation using expectation propagation. In: SPIE Optical Engineering Applications. International Society for Optics and Photonics, USA, pp 1–20

  10. Dai Y, Fang Y, He D, Huang B (2013) Parallel design for error-resilient entropy coding algorithm on GPU. J Parallel Distrib Comput 73(4):411–419

    Article  Google Scholar 

  11. Dai Y, He D, Fang Y, Yang L (2014) Accelerating 2D orthogonal matching pursuit algorithm on GPU. J Supercomput 69(3):1363–1381

    Article  Google Scholar 

  12. Dragotti PL, Gastpar M (2009) Distributed source coding: theory. Algorithms and applications. Academic Press, New York

    Google Scholar 

  13. Falcão G, Silva V, Sousa L (2009) How GPUs can outperform asics for fast LDPC decoding. In: Proceedings of the 23rd international conference on supercomputing. ACM, New York, pp 390–399

  14. Falcao G, Sousa L, Silva V (2011) Massively LDPC decoding on multicore architectures. IEEE Trans Parall Distrib Syst 22(2):309–322

    Article  Google Scholar 

  15. Fang Y (2013) Asymmetric slepian-wolf coding of nonstationarily-correlated M-ary sources with sliding-window belief propagation. IEEE Trans Commun 62(12):5114–5124

    Article  Google Scholar 

  16. Fang Y, Jeon G, Jeong J (2009) Rate-adaptive compression of LDPC syndromes for Slepian–Wolf coding. In: Proceedings of the 27th conference on picture coding symposium. IEEE Press, New York, pp 341–344

  17. Farber R (2011) CUDA application design and development. Elsevier, Amsterdam

    Google Scholar 

  18. Fernandes GFP (2010) Parallel algorithms and architectures for LDPC decoding

  19. Gallager RG (1962) Low-density parity-check codes. IRE Trans Inf Theory 8(1):21–28

    Article  MathSciNet  MATH  Google Scholar 

  20. Ganepola V, Carrasco R, Wassell I, Le Goff S (2008) Performance study of non-binary LDPC codes over gf (q). In: 6th International Symposium on Communication Systems, Networks and Digital Signal Processing. CNSDSP 2008. IEEE, New York, pp 585–589

  21. http://ivms.stanford.edu/~varodayan/ldpca.html (2005)

  22. http://www.stanford.edu/~dmchen/dvc.html (2005)

  23. Hennessy JL, Patterson DA (2012) Computer architecture: a quantitative approach. Elsevier, Amsterdam

    MATH  Google Scholar 

  24. Hwu Wm, Kirk D (2009) Programming massively parallel processors. Elsevier, Amsterdam

    Google Scholar 

  25. Islam MR, Shafiullah DS, Faisal MMA, Rahman I (2011) Optimized min-sum decoding algorithm for low density parity check codes. Int J Adv Comput Sci Appl 2(12):168–174

  26. Kschischang FR, Frey BJ, Loeliger HA (2001) Factor graphs and the sum–product algorithm. IEEE Trans Inf Theory 47(2):498–519

    Article  MathSciNet  MATH  Google Scholar 

  27. MacKay DJ (1999) Good error-correcting codes based on very sparse matrices. IEEE Trans Inf Theory 45(2):399–431

    Article  MathSciNet  MATH  Google Scholar 

  28. MacKay DJ, Neal RM (1996) Near shannon limit performance of low density parity check codes. Electron Lett 32(18):1645–1646

    Article  Google Scholar 

  29. Yongjian Nian HM, Wan J (2013) Low-complexity compression algorithm for hyperspectral images based on distributed source coding. Math Probl Eng 2013(1):1–7

    MathSciNet  MATH  Google Scholar 

  30. NVIDIA C (2010) CUDA C best practices guide

  31. Pai YS, Shen YC, Wu JL (2012) High efficient distributed video coding with parallelized design for LDPCA decoding on CUDA based GPGPU. J Vis Commun Image Represent 23(1):63–74

    Article  Google Scholar 

  32. Park JY, Chung KS (2011) Parallel LDPC decoding using CUDA and OpenMP. EURASIP J Wirel Commun Netw 172:1–8

    Google Scholar 

  33. Rauber T, Rünger G (2013) Parallel programming: for multicore and cluster systems. Springer Science & Business, Berlin

    Book  MATH  Google Scholar 

  34. Ryan W (2003) An introduction to LDPC codess

  35. Sanders J, Kandrot E (2010) CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional, New York

    Google Scholar 

  36. Schulz C (2013) Efficient local search on the GPU investigations on the vehicle routing problem. J Parall Distrib Comput 73(1):14–31

    Article  Google Scholar 

  37. Shi X, Li C, Wang S, Wang X (2011) Computing prestack Kirchhoff time migration on general purpose GPU. Comput Geosci 37(10):1702–1710

    Article  Google Scholar 

  38. Slepian D, Wolf JK (1973) Noiseless coding of correlated information sources. IEEE Trans Inf Theory 19(4):471–480

    Article  MathSciNet  MATH  Google Scholar 

  39. Varodayan D, Aaron A, Girod B (2006) Rate-adaptive codes for distributed source coding. Signal Process 86(11):3123–3130

    Article  MATH  Google Scholar 

  40. Varodayan D, Chen D, Flierl M, Girod B (2008) Wyner–Ziv coding of video with unsupervised motion vector learning. Signal Process Image Commun 23(5):369–378

    Article  Google Scholar 

  41. Varodayan D, Mavlankar A, Flierl M, Girod B (2007) Distributed grayscale stereo image coding with unsupervised learning of disparity. In: Data compression conference, DCC ’07, pp 143–152

  42. Wang S, Cui L, Stankovic L, Stankovic V, Cheng S (2012) Adaptive correlation estimation with particle filtering for distributed video coding. IEEE Trans Circ Syst Video Technol 22(5):649–658

    Article  Google Scholar 

  43. Wyner AD, Ziv J (1976) The rate-distortion function for source coding with side information at the decoder. IEEE Trans Inf Theory 22(1):1–10

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Science Foundation of China (grant no. 61271280), the Fundamental Research Fund for the Central Universities of China (grant nos. 2452015059, 2014YQ001, and QN2013086), the Program for New Century Excellent Talents in University of China (grant no. NCET-13-0481), the Provincial Science Foundation of Shaanxi, China-Key Project (grant no. 2016JZ024), and the Program for Youth Sci-Tech Nova of Shaanxi, China (grant no. 2014KJXX-41).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Fang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, Y., Fang, Y., Yang, L. et al. Graphics processing unit-accelerated joint-bitplane belief propagation algorithm in DSC. J Supercomput 72, 2351–2375 (2016). https://doi.org/10.1007/s11227-016-1736-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1736-5

Keywords

Navigation