research-article

Lightweight Huffman Coding for Efficient GPU Compression

Authors:

Michela Becchi,

Franck CappelloAuthors Info & Claims

ICS '23: Proceedings of the 37th International Conference on Supercomputing

Pages 99 - 110

https://doi.org/10.1145/3577193.3593736

Published: 21 June 2023 Publication History

Abstract

Lossy compression is often deployed in scientific applications to reduce data footprint and improve data transfers and I/O performance. Especially for applications requiring on-the-flight compression, it is essential to minimize compression's runtime. In this paper, we design a scheme to improve the performance of cuSZ, a GPU-based lossy compressor. We observe that Huffman coding - used by cuSZ to compress metadata generated during compression - incurs a performance overhead that can be significant, especially for smaller datasets. Our work seeks to reduce the Huffman coding runtime with minimal-to-no impact on cuSZ's compression efficiency.

Our contributions are as follows. First, we examine a variety of probability distributions to determine which distributions closely model the input to cuSZ's Huffman coding stage. From these distributions, we create a dictionary of pre-computed codebooks such that during compression, a codebook is selected from the dictionary instead of computing a custom codebook. Second, we explore three codebook selection criteria to be applied at runtime. Finally, we evaluate our scheme on real-world datasets and in the context of two important application use cases, HDF5 and MPI, using an NVIDIA A100 GPU. Our evaluation shows that our method can reduce the Huffman coding penalty by a factor of 78--92×, translating to a total speedup of up to 5× over baseline cuSZ. Smaller HDF5 chunk sizes enjoy over an 8× speedup in compression and MPI messages on the scale of tens of MB have a 1.4--30.5× speedup in communication time.

References

[1]

Bulent Abali, Bartholomew Balner, Hubertus Franke, and John J. Reilly. 2017. Creating a dynamic Huffman table.

[2]

M. Ainsworth, O. Tugluk, B. Whitney, and S. Klasky. 2017. MGARD: A Multilevel Technique for Compression of Floating-Point Data. In DRBSD-2 Workshop at Supercomputing.

[3]

BlosC compressor. [n. d.]. http://blosc.org/. Online.

[4]

M. Burtscher and P. Ratanaworabhan. 2009. FPC: A High-Speed Compressor for Double-Precision Floating-Point Data. IEEE Trans. Comput. 58, 1 (Jan 2009), 18--31.

Digital Library

[5]

Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Ali Murat Gok, Dingwen Tao, Chun Hong Yoon, Xin-Chuan Wu, Yuri Alexeev, and Frederic T Chong. 2019. Use cases of lossy compression for floating-point data in scientific data sets. The International Journal of High Performance Computing Applications 33, 6 (2019), 1201--1220. arXiv:https://doi.org/10.1177/1094342019853336

Digital Library

[6]

Yann Collet. 2015. Zstandard - Real-time data compression algorithm. http://facebook.github.io/zstd/ (2015).

[7]

HDF5. [n. d.]. https://portal.hdfgroup.org/display/HDF5/HDF5. Online.

[8]

David A. Huffman. 1952. A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE 40, 9 (1952), 1098--1101.

[9]

Sian Jin, Dingwen Tao, Houjun Tang, Sheng Di, Suren Byna, Zarija Lukic, and Franck Cappello. 2022. Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Dallas, Texas) (SC '22). IEEE Press, Article 61, 15 pages.

Digital Library

[10]

Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Shaomeng Li, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2018. Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets. In IEEE Big Data. 438--447.

[11]

OpenMPI. [n. d.]. https://www.open-mpi.org/. Online.

[12]

SA Ostadzadeh, B Maryam Elahi, ZZ Tabrizi, M Amir Moulavi, and K Bertels. 2007. A two-phase practical parallel algorithm for construction of huffman codes. In PDPTA 2007. CSREA Press, 284--291.

[13]

Ritesh A. Patel, Yao Zhang, Jason Mak, Andrew Davidson, and John D. Owens. 2012. Parallel lossless data compression on the GPU. In 2012 Innovative Parallel Computing (InPar). 1--9.

[14]

Roman Schutski, Danil Lykov, and Ivan Oseledets. 2020. Adaptive algorithm for quantum circuit simulation. Phys. Rev. A 101 (Apr 2020), 042335. Issue 4.

[15]

Eugene S. Schwartz and Bruce Kallick. 1964. Generating a Canonical Prefix Encoding. Commun. ACM 7, 3 (mar 1964), 166--169.

Digital Library

[16]

Jiannan Tian, Sheng Di, Xiaodong Yu, Cody Rivera, Kai Zhao, Sian Jin, Yunhe Feng, Xin Liang, Dingwen Tao, and Franck Cappello. 2021. Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). 283--293.

[17]

Jiannan Tian and et al. 2020. cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data (PACT '20). Association for Computing Machinery, New York, NY, USA, 3--15.

Digital Library

[18]

Jiannan Tian, Cody Rivera, Sheng Di, Jieyang Chen, Xin Liang, Dingwen Tao, and Franck Cappello. 2021. Revisiting Huffman Coding: Toward Extreme Performance on Modern GPU Architectures. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 881--891.

[19]

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedrcegosa, Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261--272.

[20]

Chengming Zhang, Sian Jin, Tong Geng, Jiannan Tian, Ang Li, and Dingwen Tao. 2022. CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression. In Proceedings of the 36th ACM International Conference on Supercomputing (Virtual Event) (ICS '22). Association for Computing Machinery, New York, NY, USA, Article 12, 13 pages.

Digital Library

[21]

Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 1643--1654.

[22]

Kai Zhao, Sheng Di, Xin Lian, Sihuan Li, Dingwen Tao, Julie Bessac, Zizhong Chen, and Franck Cappello. 2020. SDRBench: Scientific Data Reduction Benchmark for Lossy Compressors. In 2020 IEEE International Conference on Big Data (Big Data). 2716--2724.

[23]

Q. Zhou, C. Chu, N. S. Kumar, P. Kousha, S. M. Ghazimirsaeed, H. Subramoni, and D. K. Panda. 2021. Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 444--453.

[24]

Zlib. [n. d.]. https://www.zlib.net/. Online.

Cited By

Goel RSchütz MNarayanan PKerbl B(2024)Real-Time Decompression and Rasterization of Massive Point CloudsProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36753737:3(1-15)Online publication date: 9-Aug-2024
https://dl.acm.org/doi/10.1145/3675373
Song SHuang YJiang PYu XZheng WDi SCao QFeng YXie ZCappello FMencagli GDazzi PLowenthal DBadia R(2024)CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658691(309-321)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658691
Huang YDi SLi GCappello F(2024)CUSZP2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression RatioSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00021(1-18)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SC41406.2024.00021
Show More Cited By

Index Terms

Lightweight Huffman Coding for Efficient GPU Compression
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
2. Information systems
  1. Data management systems
    1. Data structures
      1. Data layout
        Data compression

Recommendations

Huffman Coding with Gap Arrays for GPU Acceleration
ICPP '20: Proceedings of the 49th International Conference on Parallel Processing

Huffman coding is a fundamental lossless data compression scheme used in many data compression file formats such as gzip, zip, png, and jpeg. Huffman encoding is easily parallelized, because all 8-bit symbols can be converted into codewords ...
Forward Looking Huffman Coding
Abstract
Huffman coding is known to be optimal, yet its dynamic version may yield smaller compressed files. The best known bound is that the number of bits used by dynamic Huffman coding in order to encode a message of n characters is at most larger by n ...
Enhanced Huffman Coding with Encryption for Wireless Data Broadcasting System
IS3C '12: Proceedings of the 2012 International Symposium on Computer, Consumer and Control

Data compression has been playing an important role in the areas of data transmission. Many great contributions have been made in this area, such as Huffman coding, LZW algorithm, run length coding, and so on. These methods only focus on the data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '23: Proceedings of the 37th ACM International Conference on Supercomputing

June 2023

505 pages

ISBN:9798400700569

DOI:10.1145/3577193

Chair:
Kyle Gallivan,
Co-chair:
Efstratios Gallopoulos,
Program Co-chairs:
Dimitrios S. Nikolopoulos,
Ramon Beivide

Copyright © 2023 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS '23

Sponsor:

SIGARCH

ICS '23: 37th International Conference on Supercomputing

June 21 - 23, 2023

FL, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
221
Total Downloads

Downloads (Last 12 months)104
Downloads (Last 6 weeks)6

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Goel RSchütz MNarayanan PKerbl B(2024)Real-Time Decompression and Rasterization of Massive Point CloudsProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36753737:3(1-15)Online publication date: 9-Aug-2024
https://dl.acm.org/doi/10.1145/3675373
Song SHuang YJiang PYu XZheng WDi SCao QFeng YXie ZCappello FMencagli GDazzi PLowenthal DBadia R(2024)CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658691(309-321)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658691
Huang YDi SLi GCappello F(2024)CUSZP2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression RatioSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00021(1-18)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SC41406.2024.00021
Liu JTian JWu SDi SZhang BUnderwood RHuang YHuang JZhao KLi GTao DChen ZCappello F(2024)cuSZ-i: High-Ratio Scientific Lossy Compression on GPUs with Optimized Multi-Level InterpolationProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00019(1-15)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00019

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten