Abstract
The M-ary source with nonstationary correlation can be encoded with a single binary low-density parity-check (LDPC) code and decoded together in distributed source coding. The joint-bitplane belief propagation (JBBP) is a useful decoding algorithm for multiple bitplanes of an M-ary source. However, it suffers from the drawbacks of low computational efficiency and long execution time. Motivated by the evolution of the Graphics Processing Unit (GPU) and the inherent parallel characteristic of the JBBP, we propose a novel approach for the computationally intensive processing of the JBBP algorithm on GPU using the compute unified device architecture programming model. Two different parallel modes are utilized for the belief passing between different nodes of the JBBP. It is found that the bottlenecks of the JBBP lie in computing the overall probability mass functions (pmfs) of symbol nodes and the overall beliefs of bit nodes. Thus, a data partitioning method is leveraged to split a large array of pmfs into small pieces which can be loaded into L1 cache instead of global memory. The optimal block size is selected which not only assigns as large L1 cache as possible for individual thread, but also guarantees multiple active warps in each stream multiprocessor. Experimental results show that when the length-6336 (length-50,688, resp.) LDPC accumulate (LDPCA) code is used to compress the source, the JBBP decoder can achieve about 20\(\times \) (41\(\times \), resp.) speedup on GPU compared with the original C code on CPU. Better performance would be further obtained with longer LDPCA codes. Moreover, the parallel JBBP is also applied in hyperspectral image compression and video coding and it shows good speedup performance.













Similar content being viewed by others
References
Alon N, Luby M (1996) A linear time erasure-resilient code with nearly optimal recovery. IEEE transactions on information theory 42(6):1732–1736
Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J. Parallel Distrib. Comput. 10(2):188–192
Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269
Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. In: Computer graphics forum, vol 8, pp 3–11. Wiley Online Library, New York
Arif M, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multiring network. J Supercomput 25(1):43–62
Berrou C, Glavieux A (1996) Near optimum error correcting coding and decoding: turbo-codes. IEEE Trans Commun 44(10):1261–1271
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessorłtheoretical properties and algorithms. Parallel Comput 21(11):1783–1805
Conde-Canencia L, Al-Ghouwayel A, Boutillon E et al (2009) Complexity comparison of non-binary LDPC decoders. ICT-MobileSummit 1–8
Cui L, Wang S, Jiang X, Cheng S (2012) Adaptive distributed video coding with correlation estimation using expectation propagation. In: SPIE Optical Engineering Applications. International Society for Optics and Photonics, USA, pp 1–20
Dai Y, Fang Y, He D, Huang B (2013) Parallel design for error-resilient entropy coding algorithm on GPU. J Parallel Distrib Comput 73(4):411–419
Dai Y, He D, Fang Y, Yang L (2014) Accelerating 2D orthogonal matching pursuit algorithm on GPU. J Supercomput 69(3):1363–1381
Dragotti PL, Gastpar M (2009) Distributed source coding: theory. Algorithms and applications. Academic Press, New York
Falcão G, Silva V, Sousa L (2009) How GPUs can outperform asics for fast LDPC decoding. In: Proceedings of the 23rd international conference on supercomputing. ACM, New York, pp 390–399
Falcao G, Sousa L, Silva V (2011) Massively LDPC decoding on multicore architectures. IEEE Trans Parall Distrib Syst 22(2):309–322
Fang Y (2013) Asymmetric slepian-wolf coding of nonstationarily-correlated M-ary sources with sliding-window belief propagation. IEEE Trans Commun 62(12):5114–5124
Fang Y, Jeon G, Jeong J (2009) Rate-adaptive compression of LDPC syndromes for Slepian–Wolf coding. In: Proceedings of the 27th conference on picture coding symposium. IEEE Press, New York, pp 341–344
Farber R (2011) CUDA application design and development. Elsevier, Amsterdam
Fernandes GFP (2010) Parallel algorithms and architectures for LDPC decoding
Gallager RG (1962) Low-density parity-check codes. IRE Trans Inf Theory 8(1):21–28
Ganepola V, Carrasco R, Wassell I, Le Goff S (2008) Performance study of non-binary LDPC codes over gf (q). In: 6th International Symposium on Communication Systems, Networks and Digital Signal Processing. CNSDSP 2008. IEEE, New York, pp 585–589
Hennessy JL, Patterson DA (2012) Computer architecture: a quantitative approach. Elsevier, Amsterdam
Hwu Wm, Kirk D (2009) Programming massively parallel processors. Elsevier, Amsterdam
Islam MR, Shafiullah DS, Faisal MMA, Rahman I (2011) Optimized min-sum decoding algorithm for low density parity check codes. Int J Adv Comput Sci Appl 2(12):168–174
Kschischang FR, Frey BJ, Loeliger HA (2001) Factor graphs and the sum–product algorithm. IEEE Trans Inf Theory 47(2):498–519
MacKay DJ (1999) Good error-correcting codes based on very sparse matrices. IEEE Trans Inf Theory 45(2):399–431
MacKay DJ, Neal RM (1996) Near shannon limit performance of low density parity check codes. Electron Lett 32(18):1645–1646
Yongjian Nian HM, Wan J (2013) Low-complexity compression algorithm for hyperspectral images based on distributed source coding. Math Probl Eng 2013(1):1–7
NVIDIA C (2010) CUDA C best practices guide
Pai YS, Shen YC, Wu JL (2012) High efficient distributed video coding with parallelized design for LDPCA decoding on CUDA based GPGPU. J Vis Commun Image Represent 23(1):63–74
Park JY, Chung KS (2011) Parallel LDPC decoding using CUDA and OpenMP. EURASIP J Wirel Commun Netw 172:1–8
Rauber T, Rünger G (2013) Parallel programming: for multicore and cluster systems. Springer Science & Business, Berlin
Ryan W (2003) An introduction to LDPC codess
Sanders J, Kandrot E (2010) CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional, New York
Schulz C (2013) Efficient local search on the GPU investigations on the vehicle routing problem. J Parall Distrib Comput 73(1):14–31
Shi X, Li C, Wang S, Wang X (2011) Computing prestack Kirchhoff time migration on general purpose GPU. Comput Geosci 37(10):1702–1710
Slepian D, Wolf JK (1973) Noiseless coding of correlated information sources. IEEE Trans Inf Theory 19(4):471–480
Varodayan D, Aaron A, Girod B (2006) Rate-adaptive codes for distributed source coding. Signal Process 86(11):3123–3130
Varodayan D, Chen D, Flierl M, Girod B (2008) Wyner–Ziv coding of video with unsupervised motion vector learning. Signal Process Image Commun 23(5):369–378
Varodayan D, Mavlankar A, Flierl M, Girod B (2007) Distributed grayscale stereo image coding with unsupervised learning of disparity. In: Data compression conference, DCC ’07, pp 143–152
Wang S, Cui L, Stankovic L, Stankovic V, Cheng S (2012) Adaptive correlation estimation with particle filtering for distributed video coding. IEEE Trans Circ Syst Video Technol 22(5):649–658
Wyner AD, Ziv J (1976) The rate-distortion function for source coding with side information at the decoder. IEEE Trans Inf Theory 22(1):1–10
Acknowledgments
This work was supported by the National Science Foundation of China (grant no. 61271280), the Fundamental Research Fund for the Central Universities of China (grant nos. 2452015059, 2014YQ001, and QN2013086), the Program for New Century Excellent Talents in University of China (grant no. NCET-13-0481), the Provincial Science Foundation of Shaanxi, China-Key Project (grant no. 2016JZ024), and the Program for Youth Sci-Tech Nova of Shaanxi, China (grant no. 2014KJXX-41).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dai, Y., Fang, Y., Yang, L. et al. Graphics processing unit-accelerated joint-bitplane belief propagation algorithm in DSC. J Supercomput 72, 2351–2375 (2016). https://doi.org/10.1007/s11227-016-1736-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1736-5