research-article

Huffman Coding with Gap Arrays for GPU Acceleration

Authors:
Naoya Yamamoto

Hiroshima University

Hiroshima University
View Profile

,
Koji Nakano

Hiroshima University, Japan

Hiroshima University, Japan
View Profile

,
Yasuaki Ito

Hiroshima University

Hiroshima University
View Profile

,
Daisuke Takafuji

Hiroshima University

Hiroshima University
View Profile

,
Akihiko Kasagi

Fujitsu Laboratories Ltd.

Fujitsu Laboratories Ltd.
View Profile

,
Tsuguchika Tabaru

Fujitsu Laboratories Ltd.

Fujitsu Laboratories Ltd.
View Profile

ICPP '20: Proceedings of the 49th International Conference on Parallel ProcessingAugust 2020Article No.: 1Pages 1–11https://doi.org/10.1145/3404397.3404429

Published:17 August 2020Publication History

ICPP '20: Proceedings of the 49th International Conference on Parallel Processing

Pages 1–11

ABSTRACT

Huffman coding is a fundamental lossless data compression scheme used in many data compression file formats such as gzip, zip, png, and jpeg. Huffman encoding is easily parallelized, because all 8-bit symbols can be converted into codewords independently. On the other hand, since an encoded codeword sequence has no separator to identify each codeword, parallelizing Huffman decoding is a much harder task. This work presents a new data structure called gap array to be attached to an encoded codeword sequence of Huffman coding for accelerating parallel Huffman decoding. In addition, it also shows that GPU Huffman encoding and decoding can be accelerated by several techniques including (1) the Single Kernel Soft Synchronization (SKSS), (2) wordwise global memory access and (3) compact codebooks. The experimental results for 10 files on NVIDIA Tesla V100 GPU show that our GPU Huffman encoding and decoding run 2.87x-7.70x times and 1.26x-2.63x times faster than previously presented GPU Huffman encoding and decoding, respectively. Also, Huffman decoding can be further accelerated by a factor of 1.67x-6450x if a gap array is attached to an encoded codeword sequence. Since the size and computing overhead of gap arrays in Huffman encoding are small, we can conclude that gap arrays should be introduced for GPU Huffman encoding and decoding.

References

P. Deutsch. 1996. DEFLATE Compressed Data Format Specification version 1.3. https://www.rfc-editor.org/info/rfc1951.Google Scholar
Yutaro Emoto, Shunji Funasaka, Hiroki Tokura, Takumi Honda, Koji Nakano, and Yasuaki Ito. 2018. An Optimal Parallel Algorithm for Computing the Summed Area Table on the GPU. In Proc. of International Parallel and Distributed Processing Symposium Workshops. 763–772.Google ScholarCross Ref
T. Ferguson and J. Rabinowitz. 1984. Self-synchronizing Huffman codes (Corresp.). IEEE Trans. on Information Theory 30, 4 (July 1984), 687 – 693.Google ScholarDigital Library
Antonio Fuentes-Alventosa, Juan Gómez-Luna ; JoséM González-Linares, and Nicolás Guil. 2014. CUVLE: Variable-Length Encoding on CUDA. In Proc. Conference on Design and Architectures for Signal and Image Processing. 1–6.Google ScholarCross Ref
Shunji Funasaka, Koji Nakano, and Yasuaki Ito. 2015. Fast LZW compression using a GPU. In Proc. of International Symposium on Computing and Networking. 303–308.Google ScholarDigital Library
Shunji Funasaka, Koji Nakano, and Yasuaki Ito. 2016. Fully Parallelized LZW Decompression for CUDA-Enabled GPUs. IEICE Transactions on Information and Systems 99-D, 12 (Dec. 2016), 2986–2994.Google Scholar
Shunji Funasaka, Koji Nakano, and Yasuaki Ito. 2016. Light Loss-Less Data Compression, with GPU Implementation. In Proc. of International Conference on Algorithms and Architectures for Parallel Processing (LNCS 10048). 281–294.Google ScholarCross Ref
Shunji Funasaka, Koji Nakano, and Yasuaki Ito. 2017. Adaptive loss-less data compression method optimized for GPU decompression. Concurrency and Computation: Practice and Experience 29, 24(2017), e4283.Google ScholarCross Ref
Shunji Funasaka, Koji Nakano, and Yasuaki Ito. 2017. Single Kernel Soft Synchronization Technique for Task Arrays on CUDA-enabled GPUs, with Applications. In Proc. International Symposium on Networking and Computing. pp.11–20.Google ScholarCross Ref
Mark Harris, Shubhabrata Sengupta, and John D. Owens. 2007. Chapter 39. Parallel Prefix Sum (Scan) with CUDA. In GPU Gems 3. Addison-Wesley, 851––876.Google Scholar
Takumi Honda, Shinnosuke Yamamoto, Hiroaki Honda, Koji Nakano, and Yasuaki Ito. 2017. Simple and Fast Parallel Algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU Implementations. In Proc. of International Conference on Parallel Processing. 362–371.Google ScholarCross Ref
David A. Huffman. 1952. A Method for the Construction of Minimum-Redundancy Codes. In Proc. of the IRE, Vol. 40. 1098 – 1101.Google ScholarCross Ref
Wen-mei W. Hwu. 2011. GPU Computing Gems Emerald Edition. Morgan Kaufmann.Google Scholar
ISO. 1994. ISO/IEC 10918-1:1994. https://www.iso.org/standard/18902.html.Google Scholar
Jyrki Katajainen, Alistair Moffat, and Andrew Turpin. 1995. A Fast and Space - Economical Algorithm for Length - Limited Coding. In Proc. of International Symposium on Algorithms and Computation. 12–21.Google ScholarCross Ref
S. T. Klein and Y. Wiseman. 2003. Parallel Huffman Decoding with Applications to JPEG Files. Comput. J. 46, 5 (Jan. 2003), 487 – 497.Google Scholar
Lawrence L. Larmore and Daniel S. Hirschberg. 1990. A fast algorithm for optimal length-limited Huffman codes. J. ACM 37, 3 (July 1990), 464–473.Google ScholarDigital Library
Duane Merrill. 2017. CUB : A library of warp-wide, block-wide, and device-wide GPU parallel primitives. https://nvlabs.github.io/cub/.Google Scholar
Duane Merrill and Michael Garland. 2016. Single-pass Parallel Prefix Scan with Decoupled Look-back. Technical Report NVR-2016-002. NVIDIA.Google Scholar
NVIDIA Corporation. 2017. NVIDIA TESLA V100 GPU ARCHITECTURE. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf.Google Scholar
NVIDIA Corporation. 2019. NVIDIA CUDA C++ Programming Guide Version 10.2.Google Scholar
Kohei Ogawa, Yasuaki Ito, and Koji Nakano. 2010. Efficient Canny Edge Detection Using a GPU. In Proc. of International Conference on Networking and Computing. IEEE CS Press, 279–280.Google ScholarDigital Library
Adnan Ozsoy and Martin Swany. 2011. CULZSS: LZSS Lossless Data Compression on CUDA. In Proc. International Conference on Cluster Computing. 403 – 411.Google ScholarDigital Library
Ritesh A. Patel, Yao Zhang, Jason Mak, and Andrew Davidson. 2012. Parallel lossless data compression on the GPU. In Proc. of Innovative Parallel Computing (InPar). 1–9.Google ScholarCross Ref
Habibelahi Rahmani, Cihan Topal, and Cuneyt Akinlar. 2014. A parallel Huffman coder on the CUDA architecture. In Proc. of IEEE Visual Communications and Image Processing Conference. 311–314.Google ScholarCross Ref
Greg Roelofs, Jean loup Gailly, and Mark Adler. 2006. Zlib - Technical Details. https://www.zlib.net/zlib_tech.html.Google Scholar
Evangelia Sitaridi, Rene Mueller, Tim Kaldewey, Guy Lohman, and Kenneth A. Ross. 2016. Massively-Parallel Lossless Data Decompression. In Proc. of Internatinal Conference on Parallel Processing. 242–247.Google ScholarCross Ref
Hiroki Tokura, Toru Fujita, Koji Nakano, Yasuaki Ito, and Jacir Luiz Bordim. 2018. Almost optimal column-wise prefix-sum computation on the GPU. The Journal of Supercomputing 74, 4 (2018), 1510–1521.Google ScholarDigital Library
André Weissenberger. 2018. CUHD - A Massively Parallel Huffman Decoder. https://github.com/weissenberger/gpuhd.Google Scholar
André Weissenberger and Bertil Schmidt. 2018. Massively Parallel Huffman Decoding on GPUs. In Proc. of International Conference on Parallel Processing. 1–10.Google ScholarDigital Library
Kohei Yamashita, Yasuaki Ito, and Koji Nakano. 2019. Bulk execution of the dynamic programming for the optimal polygon triangulation problem on the GPU. Concurrency and Computation: Practice and Experience 31, 1(2019), e4947.Google ScholarCross Ref

Recommendations

Lightweight Huffman Coding for Efficient GPU Compression
ICS '23: Proceedings of the 37th International Conference on Supercomputing

Lossy compression is often deployed in scientific applications to reduce data footprint and improve data transfers and I/O performance. Especially for applications requiring on-the-flight compression, it is essential to minimize compression's runtime. ...
Read More
Forward Looking Huffman Coding
Computer Science – Theory and Applications
Abstract
Huffman coding is known to be optimal, yet its dynamic version may yield smaller compressed files. The best known bound is that the number of bits used by dynamic Huffman coding in order to encode a message of n characters is at most larger by n ...
Read More
Forward Looking Huffman Coding
Abstract
Huffman coding is known to be optimal, yet its dynamic version may yield smaller compressed files. The best known bound is that the number of bits used by dynamic Huffman coding in order to encode a message of n characters is at most larger by n ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP '20: Proceedings of the 49th International Conference on Parallel Processing
August 2020
844 pages
ISBN:9781450388160
DOI:10.1145/3404397

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 August 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
loss-less data compression
parallel computing
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate91of313submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 592
  Total Downloads
- Downloads (Last 12 months)65
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Huffman Coding with Gap Arrays for GPU Acceleration

ICPP '20: Proceedings of the 49th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Recommendations

Lightweight Huffman Coding for Efficient GPU Compression

Forward Looking Huffman Coding

Forward Looking Huffman Coding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Huffman Coding with Gap Arrays for GPU Acceleration

ICPP '20: Proceedings of the 49th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Recommendations

Lightweight Huffman Coding for Efficient GPU Compression

Forward Looking Huffman Coding

Forward Looking Huffman Coding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media