skip to main content
10.1145/3225058.3225076acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Massively Parallel Huffman Decoding on GPUs

Published: 13 August 2018 Publication History

Abstract

Data compression is a fundamental building block in a wide range of applications. Besides its intended purpose to save valuable storage on hard disks, compression can be utilized to increase the effective bandwidth to attached storage as realized by state-of-the-art file systems. In the foreseeing future, on-the-fly compression and decompression will gain utmost importance for the processing of data-intensive applications such as streamed Deep Learning tasks or Next Generation Sequencing pipelines, which establishes the need for fast parallel implementations. Huffman coding is an integral part of a number of compression methods. However, efficient parallel implementation of Huffman decompression is difficult due to inherent data dependencies (i.e. the location of a decoded symbol depends on its predecessors). In this paper, we present the first massively parallel decoder implementation that is compatible with Huffman's original method by taking advantage of the self-synchronization property of Huffman codes. Our performance evaluation on three different CUDA-enabled GPUs (TITAN V, TITAN XP, GTX 1080) demonstrates speedups of over one order-of-magnitude compared to the state-of-art CPU-based Zstandard Huffman decoder. Our implementation is available at https://github.com/weissenberger/gpuhd.

References

[1]
J. Alakuijala and Z. Szabadka. 2016. Brotli Compressed Data Format. RFC 7932. (July 2016).
[2]
C. A. Angulo, C. D. Hernández, G. Rincón, C. A. Boada, J. Castillo, and C. A. Fajardo. 2015. Accelerating huffman decoding of seismic data on GPUs. In 2015 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA). 1--6.
[3]
Y. Collet. {n. d.}. Zstandard - Real-time data compression algorithm. http://facebook.github.io/zstd. ({n. d.}). Retrieved April 03, 2018.
[4]
NVIDIA Corporation. {n. d.}. Thrust | NVIDIA Developer. https://developer.nvidia.com/thrust. ({n. d.}). Retrieved April 03, 2018.
[5]
T. Davis. {n. d.}. SuiteSparse Matrix Collection. https://sparse.tamu.edu/MM/Janna/Flan_1565.tar.gz. ({n. d.}). Retrieved April 03, 2018.
[6]
S. Deorowicz. {n. d.}. Silesia compression corpus. http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia. ({n. d.}). Retrieved April 03, 2018.
[7]
P. Deutsch. 1996. DEFLATE Compressed Data Format Specification version 1.3. RFC 1951. (May 1996).
[8]
P. Deutsch. 1996. GZIP file format specification version 4.3. RFC 1952. (May 1996).
[9]
J. Duda, K. Tahboub, N. J. Gadgil, and E. J. Delp. 2015. The use of asymmetric numeral systems as an accurate replacement for Huffman coding. In 2015 Picture Coding Symposium (PCS). 65--69.
[10]
The Apache Software Foundation. {n. d.}. Welcome to Apache Hadoop! https://hadoop.apache.org/. ({n. d.}). Retrieved April 03, 2018.
[11]
A. S. Fraenkel and S. T. Klein. 1990. Bidirectional Huffman Coding. Comput. J. 33, 4 (1990), 296--307.
[12]
C. F. Freiling, D. S. Jungreis, F. Theberge, and K. Zeger. 2003. Almost all complete binary prefix codes have a self-synchronizing string. IEEE Transactions on Information Theory 49, 9 (Sept 2003), 2219--2225.
[13]
S. Funasaka, K. Nakano, and Y. Ito. 2015. Fast LZW Compression Using a GPU. In 2015 Third International Symposium on Computing and Networking (CANDAR). 303--308.
[14]
S. Funasaka, K. Nakano, and Y. Ito. 2016. Light Loss-Less Data Compression, with GPU Implementation. In Algorithms and Architectures for Parallel Processing, J. Carretero, J. Garcia-Blas, R. K.L. Ko, P. Mueller, and K. Nakano (Eds.). Springer International Publishing, Cham, 281--294.
[15]
S. Funasaka, K. Nakano, and Y. Ito. 2017. Adaptive loss-less data compression method optimized for GPU decompression. Concurrency and Computation: Practice and Experience 29, 24 (2017), e4283. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.4283
[16]
E. N. Gilbert and E. F. Moore. 1959. Variable-length binary encodings. The Bell System Technical Journal 38, 4 (July 1959), 933--967.
[17]
D. A. Huffman. 1952. A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE 40, 9 (Sept 1952), 1098--1101.
[18]
S. T. Klein and Y. Wiseman. 2003. Parallel Huffman Decoding with Applications to JPEG Files. Comput. J. 46, 5 (2003), 487--497.
[19]
M. Mahoney. {n. d.}. Large text compression benchmark. http://mattmahoney.net/dc/enwik9.zip. ({n. d.}). Retrieved April 03, 2018.
[20]
A. Ozsoy and M. Swany. 2011. CULZSS: LZSS Lossless Data Compression on CUDA. In 2011 IEEE International Conference on Cluster Computing. 403--411.
[21]
R. A. Patel, Y. Zhang, J. Mak, A. Davidson, and J. D. Owens. 2012. Parallel lossless data compression on the GPU. In 2012 Innovative Parallel Computing (InPar). 1--9.
[22]
J. J. Rissanen. 1976. Generalized Kraft Inequality and Arithmetic Coding. IBM Journal of Research and Development 20, 3 (May 1976), 198--203.
[23]
E. Sitaridi, R. Mueller, T. Kaldewey, G. Lohman, and K. A. Ross. 2016. Massively-Parallel Lossless Data Decompression. In 2016 45th International Conference on Parallel Processing (ICPP). 242--247.
[24]
W. Sodsong, M. Jung, J. Park, and B. Burgstaller. 2016. JParEnt: Parallel Entropy Decoding for JPEG Decompression on Heterogeneous Multicore Architectures. In Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM'16). ACM, New York, NY, USA, 104--113.

Cited By

View all
  • (2024)Real-Time Decompression and Rasterization of Massive Point CloudsProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36753737:3(1-15)Online publication date: 9-Aug-2024
  • (2024)Massively Parallel Inverse Block-sorting Transforms for bzip2 Decompression on GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673067(856-865)Online publication date: 12-Aug-2024
  • (2024)CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658691(309-321)Online publication date: 3-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
August 2018
945 pages
ISBN:9781450365109
DOI:10.1145/3225058
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CUDA
  2. Data compression
  3. GPUs
  4. Huffman Decoding

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2018

Acceptance Rates

ICPP '18 Paper Acceptance Rate 91 of 313 submissions, 29%;
Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)105
  • Downloads (Last 6 weeks)9
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Real-Time Decompression and Rasterization of Massive Point CloudsProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36753737:3(1-15)Online publication date: 9-Aug-2024
  • (2024)Massively Parallel Inverse Block-sorting Transforms for bzip2 Decompression on GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673067(856-865)Online publication date: 12-Aug-2024
  • (2024)CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658691(309-321)Online publication date: 3-Jun-2024
  • (2024)Fast Compressed Segmentation Volumes for Scientific VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332657330:1(12-22)Online publication date: 1-Jan-2024
  • (2023)LiquidProceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3609510.3609811(50-57)Online publication date: 24-Aug-2023
  • (2023)Recoil: Parallel rANS Decoding with Decoder-Adaptive ScalabilityProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605588(31-40)Online publication date: 7-Aug-2023
  • (2023)Rapidgzip: Parallel Decompression and Seeking in Gzip Files Using Cache PrefetchingProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592992(295-307)Online publication date: 7-Aug-2023
  • (2023)Editing Compressed High‐resolution Voxel Scenes with AttributesComputer Graphics Forum10.1111/cgf.1475742:2(235-243)Online publication date: 23-May-2023
  • (2023)Energy-efficient canonical Huffman decoders on many-core processor arrays and FPGAsIntegration, the VLSI Journal10.1016/j.vlsi.2022.09.01588:C(156-165)Online publication date: 1-Jan-2023
  • (2022)L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN TrainingComputer Vision – ECCV 202210.1007/978-3-031-20083-0_11(171-188)Online publication date: 3-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media