skip to main content
10.1145/2883404.2883423acmotherconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

JParEnt: Parallel Entropy Decoding for JPEG Decompression on Heterogeneous Multicore Architectures

Published: 12 March 2016 Publication History

Abstract

The JPEG format is the de facto image compression standard, with billions of views every day. Parallelizing the entropy decoding step of the JPEG decompression algorithm remains a challenging problem, because codewords are of variable length, and the start-position of a codeword in the bitstream is not known before the previous codeword has been decoded.
In this paper, we present JParEnt, a novel parallel entropy decoding method for JPEG decompression on heterogeneous multicores. JParEnt applies a fast block boundary scan on the CPU to determine the start-positions of coefficient blocks in the bitstream, followed by parallel entropy decoding on the GPU. Our pipelined execution scheme exploits parallelism between CPU and GPU, and overlaps almost all CPU-to-GPU data transfers with GPU kernel executions.
We have evaluated JParEnt's performance for more than 1000 images on four heterogeneous multicore platforms, including one embedded board. JParEnt is up to 4.3× faster than the SIMD-implementation of the libjpeg-turbo library. On average, JParEnt's CPU-based boundary scan consumes 45% of the sequential entropy decoding time of libjpeg-turbo. Given this new constant for the non-parallelizable part of JPEG decompression, JParEnt achieves up to 97% of the theoretically attainable speedup, with an average of 95%.

References

[1]
Alexa Top 500 Global Sites. http://www.alexa.com/topsites, accessed in Feb. 2015.
[2]
Cannon: EOS 5Ds Sample Images. http://web.canon.jp/imaging/eosd/samples/eos5ds, accessed in Nov. 2015.
[3]
R. Capocelli, L. Gargano, and U. Vaccaro. On the characterization of statistically synchronizable variable-length codes. Information Theory, IEEE Transactions on, 34(4):817--825, 1988.
[4]
Erste Bilder mit der Canon EOS 5DS. http://www.heise.de/newsticker/meldung/Erste-Bilder-mit-der-Canon-EOS-5DS-2733236.html, accessed in Nov. 2015.
[5]
C. Gregg and K. Hazelwood. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In Performance Analysis of Systems and Software (ISPASS), 2011 IEEE International Symposium on, pages 134--144, 2011.
[6]
H. Rahmani et al. A parallel Huffman coder on the CUDA architecture. In Visual Communications and Image Processing Conference, 2014 IEEE, pages 311--314. IEEE, 2014.
[7]
Instagram Today: 200 Million Strong. http://blog.instagram.com/post/80721172292/200m, accessed in Nov. 2015.
[8]
International Telecommunication Union. CCITT recommendation. Information technology---digital compression and coding of continuous-tone still images: Requirements and guidelines. 1993.
[9]
S. T. Klein and Y. Wiseman. Parallel Huffman decoding with applications to JPEG files. The Computer Journal, 46(5):487--497, 2003.
[10]
Libjpeg. http://ijg.org, accessed in Nov. 2015.
[11]
Libjpeg Turbo. http://libjpeg-turbo.org, accessed in Nov. 2015.
[12]
List of test images from Wikimedia Commons. http://elc.yonsei.ac.kr/jpeg/images.html.
[13]
M. Harris et al. Parallel prefix sum (scan) with CUDA. GPU gems, 3(39):851--876, 2007.
[14]
NVIDIA. CUDA C programming guide v7.5. Technical report, NVIDIA Corporation, 2015.
[15]
I. M. Pu. Fundamental Data Compression. Butterworth-Heinemann, Newton, MA, USA, 2005.
[16]
R. Cloud et al. Accelerating lossless data compression with GPUs. arXiv:1107.1525, 2011.
[17]
R. Patel et al. Parallel lossless data compression on the GPU. In Innovative Parallel Computing (InPar), 2012, pages 1--9, May 2012.
[18]
U.S. Securities and Exchange Commission. FORM 10-K (Annual Report). 2013.
[19]
W. Sodsong et al. Dynamic partitioning-based JPEG decompression on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, Aug. 2015 (online first).
[20]
W. Sodsong et al. Dynamic partitioning-based JPEG decompression on heterogeneous multicore architectures. In Proceedings of the 2014 PPOPP International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2014, pages 80--91. ACM, Feb. 15, 2014.
[21]
G. K. Wallace. The JPEG still picture compression standard. Commun. ACM, 34(4):30--44, Apr. 1991.
[22]
Wikimedia Commons. https://commons.wikimedia.org, accessed in Nov. 2015.

Cited By

View all
  • (2022)DIESEL+: Accelerating Distributed Deep Learning Tasks on Image DatasetsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310425233:5(1173-1184)Online publication date: 1-May-2022
  • (2021)Accelerating JPEG Decompression on GPUs2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC53243.2021.00026(121-130)Online publication date: Dec-2021
  • (2020)Accelerating Deep Learning Tasks with Optimized GPU-assisted Image Decoding2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS51040.2020.00045(274-281)Online publication date: Dec-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PMAM'16: Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores
March 2016
127 pages
ISBN:9781450341967
DOI:10.1145/2883404
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 March 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. JPEG compression
  2. entropy decoding
  3. variable-length codes

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PPoPP '16

Acceptance Rates

Overall Acceptance Rate 53 of 97 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)3
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)DIESEL+: Accelerating Distributed Deep Learning Tasks on Image DatasetsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310425233:5(1173-1184)Online publication date: 1-May-2022
  • (2021)Accelerating JPEG Decompression on GPUs2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC53243.2021.00026(121-130)Online publication date: Dec-2021
  • (2020)Accelerating Deep Learning Tasks with Optimized GPU-assisted Image Decoding2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS51040.2020.00045(274-281)Online publication date: Dec-2020
  • (2018)Massively Parallel Huffman Decoding on GPUsProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225076(1-10)Online publication date: 13-Aug-2018
  • (2017)JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architecturesConcurrency and Computation: Practice and Experience10.1002/cpe.411129:15Online publication date: 10-Jul-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media