Abstract
The fast JPEG image compression algorithm is a requisite in many applications such as high-speed video measurement systems and digital cinema. Many existing methods have implemented the JPEG compression in parallel based on GPU except for entropy coding, which is a variable-length coding method and seems like a better fit for sequential implementation. However, entropy coding is an essential part of the JPEG compression system and typically takes up a large proportion of the time when implemented on the CPU. To tackle this problem, we propose an efficient parallel entropy coding (EPEnt) method for parallel JPEG compressing. The proposed method conducts entropy coding in three parallel steps: coding, shifting, and stuffing. Specifically, according to the different characteristics of image components, we devise thread-based and warp-based functions in the coding stage to further improve the efficiency under guaranteeing image quality, respectively. We apply the proposed method to the parallel JPEG compression system and evaluate the performance based on compute unified device architecture (CUDA). The experimental results demonstrate that compared with sequential implementation, the maximum speedup ratio of entropy coding can reach 39 times without affecting compressed images quality. Meanwhile, the whole JPEG compression process efficiency increases by at least 28% compared with state-of-the-art parallel methods in terms of speedup ratio.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Aguilar AH, Bonilla-Robles JC, Díaz JCZ et al (2019) Real-time video image processing through GPUs and CUDA and its future implementation in real problems in a Smart City. Int J Combinat Optim Prob Inform 10(3):33–49
Haiyan Zhang A (2002) Image compression. Technology 14(7):831–835
Li J, Wu J, Jeon G et al (2020) GPU acceleration of clustered DPCM for lossless compression of hyperspectral Images. IEEE Trans Industr Inf 16(5):2906–2916
Wallace GK (1991) The JPEG still picture compression standard. Commun ACM 34(4):30–44
Tadisetty S (2019) A novel ortho normalized multi-stage discrete fast Stockwell transform based memory-aware high-speed VLSI implementation for image compression. Multim Tools Appl 78(13):17673–17699
Salah A, Li K, Hosny KM et al (2020) Accelerated CPU–GPUs implementations for quaternion polar harmonic transform of color images. Futur Gener Comput Syst 107:368–382
Spiliotis IM, Bekakos MP, Boutalis YS (2020) Parallel implementation of the image block representation using OpenMP. J Parall Distrib Comput 137:134–147
Hosny KM, Salah A, Saleh HI et al (2019) Fast computation of 2D and 3D Legendre moments using multi-core CPUs and GPU parallel architectures. J Real-Time Image Proc 16(6):2027–2041
Yuan Y, Yang X, Wu W et al (2019) A fast single-image super-resolution method implemented with CUDA. J Real-Time Image Proc 16(1):81–97
Alqudami N, Kim SD (2016) OpenCL-based optimization methods for utilizing forward DCT and quantization of image compression on a heterogeneous platform. J Real-Time Image Proc 12(2):219–235
Ghetia S, Gajjar N, Gajjar R (2013) Implementation of 2-D discrete cosine transform algorithm on GPU. Int J Adv Res Electric Electron Instrum Eng 2(7):3024–3030
Haweel RT, El-Kilani WS, Ramadan HH (2016) Fast approximate DCT with GPU implementation for image compression. J Vis Commun Image Represent 40:357–365
Obukhov A, Kharlamov A (2008) Discrete cosine transform for 8x8 blocks with CUDA. NVIDIA white paper
Tokdemir S, Belkasim S. Parallel processing of DCT on GPU. 2011 Data Compression Conference. IEEE, 2011: 479–479
Shan R, Zhou X, Wang CY et al (2016) All phase discrete sine biorthogonal transform and its application in JPEG-like image coding using GPU. TIIS 10(9):4467–4486
Wang C, Shan R, Zhou X (2015) APBT-JPEG image coding based on GPU. KSII Trans Int Inform Syst (TIIS) 9(4):1457–1470
Shatnawi MKA, Shatnawi HA A performance model of fast 2D-DCT parallel JPEG encoding using CUDA GPU and SMP-architecture. 2014 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2014: 1-6
Liu D, Fan XY. Parallel program design for JPEG compression encoding. 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, 2012: 2502–2506.
Enfedaque P, Auli-Llinas F, Moure JC. Strategies of SIMD computing for image coding in GPU. 2015 IEEE 22nd International Conference on High Performance Computing (HiPC). IEEE, 2015: 345–354.
Balevic A. Parallel variable-length encoding on GPGPUs. International Conference on Parallel Processing, 2009: 26–35.
Patel P, Wong J, Tatikonda M, et al. JPEG compression algorithm using CUDA. Department of Computer Engineering, University of Toronto, Course Project for ECE, 2009, 1724.
Zhang M, Zhang J, Qiu X (2017) Parallel design and implementation of JPEG compression algorithm based on OpenCL. Comput Eng Sci 39(5):855–860
Rahmani H, Topal C, Akinlar C (2014) A parallel Huffman coder on the CUDA architecture[C]. In: IEEE Visual Communications and Image Processing Conference, vol 2014. IEEE, pp 311–314
Sudarshan ESC and Chigarapalle S, 2017 A compact parallel Huffman entropy coding technique on GPGPU using CUDA. ARPN J Eng Appl Sci 7111–7118.
Single pass prefix sum in a vertex shader. U.S. Patent Application 16/007,893. 2019.
M. Harris, S. Sengupta, J. D. Owens, H. Nguyen. Parallal prefix Sum (Scan) with CUDA, in: GPU Gems 3 Part VI: GPU Computing, Addison Wesley, 2007: 851–876.
Sengupta S, A. E Lefohn, J.D. Owens. A work-efficient step-efficient prefix sum algorithm, in: Workshop on Edge Computing Using New Commodity Architectures, 2006.
Shan R, Wang C, Huang W, Zhou X (2015) DCT-JPEG image coding based on GPU. Int J Hybrid Inform Technol 8(5):293–302
NVIDIA CUDA C++ Programming Guide, 10.2, 2018
Harris M. Optimizing parallel reduction in cuda, [online] Available: https://developer.download. nvidia.com/assets/cuda/files/reduction.pdf.
Sodsong W, Jung M, Park J, et al. JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 2017, 29(15).
Hore A, Ziou D. Image Quality Metrics: PSNR vs. SSIM. International Conference on Pattern Recognition, 2010: 2366–2369.
Pereira AD, Ramos L, Goes LF et al (2015) PSkel: A stencil programming framework for CPU-GPU systems. Concurren Comput Prac Exp 27(17):4938–4953
Tian J , Rivera C , Di S , et al. Revisiting huffman coding: toward extreme performance on modern GPU Architectures[C]// The 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2020.
Yamamoto N, Nakano K, Ito Y, et al. Huffman Coding with Gap Arrays for GPU Acceleration[C]//49th International Conference on Parallel Processing-ICPP. 2020: 1–11.
Acknowledgements
The authors would sincerely like to thank the editor and anonymous reviewers for their detailed review.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhu, F., Yan, H. An efficient parallel entropy coding method for JPEG compression based on GPU. J Supercomput 78, 2681–2708 (2022). https://doi.org/10.1007/s11227-021-03971-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03971-6