research-article

Open access

A Portable, Fast, DCT-based Compressor for AI Accelerators

Authors:

Michela Becchi,

Franck CappelloAuthors Info & Claims

HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing

Pages 109 - 121

https://doi.org/10.1145/3625549.3658662

Published: 30 August 2024 Publication History

Abstract

Lossy compression can be an effective tool in AI training and inference to reduce memory requirements, storage footprint, and in some cases, execution time. With the rise of novel architectures designed to accelerate AI workloads, compression can continue to serve these purposes, but must be adapted to the new accelerators. Due to programmability and architectural differences, existing lossy compressors cannot be directly ported to and are not optimized for any AI accelerator, thus requiring new compression designs.

In this paper, we propose a novel, portable, DCT-based lossy compressor that can be used across a variety of AI accelerators. More specifically, we make the following contributions: 1) We propose a DCT-based lossy compressor design for training data that uses operators supported across four state-of-the-art AI accelerators: Cerebras CS-2, SambaNova SN30, Groq GroqChip, and Graphcore IPU. 2) We design two optimization techniques to allow for higher resolution compressed data on certain platforms and improved compression ratio on the IPU. 3) We evaluate our compressor's ability to preserve accuracy on four benchmarks, three of which are AI for science benchmarks going beyond image classification. Our experiments show that accuracy degradation can be limited to 3% or less, and sometimes, compression improves accuracy. 4) We study compression/decompression time as a function of resolution and batch size, finding that our compressor can achieve throughputs on the scale of tens of GB/s, depending on the platform.

References

[1]

[n. d.]. CIFAR-10 and CIFAR-100 datasets. https://www.cs.toronto.edu/~kriz/cifar.html

[2]

[n. d.]. Explore Cerebras Documentation --- Cerebras Developer Documentation. https://docs.cerebras.net/en/latest/

[3]

[n. d.]. Graphcore Documents --- Graphcore Documents. https://docs.graphcore.ai/en/latest/

[4]

[n. d.]. JPEG - JPEG 1. https://jpeg.org/jpeg/index.html

[5]

[n. d.]. Product - System. https://www.cerebras.net/product-system/

[6]

[n. d.]. SambaNova :: SambaNova Documentation. https://docs.sambanova.ai/home/latest/index.html

[7]

2023. GroqCard™ Accelerator - Groq. https://wow.groq.com/groqcard-accelerator/ Section: Blog.

[8]

2023. groq/groqflow. https://github.com/groq/groqflow original-date: 2022-08-08T23:46:56Z.

[9]

Ibrahim Ahmed, Sahil Parmar, Matthew Boyd, Michael Beidler, Kris Kang, Bill Liu, Kyle Roach, John Kim, and Dennis Abts. 2022. Answer Fast: Accelerating BERT on the Tensor Streaming Processor. In 2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP). 80--87. ISSN: 2160-052X.

[10]

N. Ahmed, T. Natarajan, and K.R. Rao. 1974. Discrete Cosine Transform. IEEE Trans. Comput. C-23, 1 (Jan. 1974), 90--93. Conference Name: IEEE Transactions on Computers.

Digital Library

[11]

Dan Alistarh, Demjan Grubic, Jerry Z. Li, Ryota Tomioka, and Milan Vojnovic. 2017. QSGD: communication-efficient SGD via gradient quantization and encoding. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 1707--1718.

[12]

Alexander Brace, Michael Salim, Vishal Subbiah, Heng Ma, Murali Emani, Anda Trifa, Austin R. Clyde, Corey Adams, Thomas Uram, Hyunseung Yoo, Andew Hock, Jessica Liu, Venkatram Vishwanath, and Arvind Ramanathan. 2021. Stream-AI-MD: Streaming AI-Driven Adaptive Molecular Simulations for Heterogeneous Computing Platforms. In Proceedings of the Platform for Advanced Scientific Computing Conference (Geneva, Switzerland) (PASC '21). Association for Computing Machinery, New York, NY, USA, Article 6, 13 pages.

Digital Library

[13]

Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica, Michael Mahoney, and Joseph Gonzalez. 2021. ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 1803--1813. https://proceedings.mlr.press/v139/chen21z.html ISSN: 2640-3498.

[14]

Sheng Di and Franck Cappello. 2016. Fast Error-Bounded Lossy HPC Data Compression with SZ. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 730--739. ISSN: 1530-2075.

[15]

Samuel Dodge and Lina Karam. 2016. Understanding how image quality affects deep neural networks. In 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX). 1--6.

[16]

Murali Emani, Sam Foreman, Varuni Sastry, Zhen Xie, Siddhisanket Raskar, William Arnold, Rajeev Thakur, Venkatram Vishwanath, and Michael E. Papka. 2023. A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators. arXiv:2310.04607 [cs].

[17]

Paul Heckbert. 1982. Color image quantization for frame buffer display. In Proceedings of the 9th Annual Conference on Computer Graphics and Interactive Techniques (Boston, Massachusetts, USA) (SIGGRAPH '82). Association for Computing Machinery, New York, NY, USA, 297--307.

Digital Library

[18]

Yafan Huang, Sheng Di, Xiaodong Yu, Guanpeng Li, and Franck Cappello. 2023. cuSZp: An Ultra-fast GPU Error-bounded Lossy Compression Framework with Optimized End-to-End Performance. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '23). Association for Computing Machinery, New York, NY, USA, 1--13.

Digital Library

[19]

Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Leon Song, and Dingwen Tao. 2021. COMET: a novel memory-efficient deep learning training framework by using error-bounded lossy compression. Proceedings of the VLDB Endowment 15, 4 (Dec. 2021), 886--899.

Digital Library

[20]

Vinu Joseph, Nithin Chalapathi, Aditya Bhaskara, Ganesh Gopalakrishnan, Pavel Panchekha, and Mu Zhang. 2020. Correctness-preserving Compression of Datasets and Neural Network Models. In 2020 IEEE/ACM 4th International Workshop on Software Correctness for HPC Applications (Correctness). 1--9.

[21]

Hyeontaek Lim, David G. Andersen, and Michael Kaminsky. 2018. 3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning. Issue: arXiv:1802.07389 arXiv:1802.07389 [cs, stat].

[22]

Peter Lindstrom. 2014. Fixed-Rate Compressed Floating-Point Arrays. IEEE Transactions on Visualization and Computer Graphics 20, 12 (Dec. 2014), 2674--2683.

[23]

Jinyang Liu, Sheng Di, Kai Zhao, Sian Jin, Dingwen Tao, Xin Liang, Zizhong Chen, and Franck Cappello. 2021. Exploring Autoencoder-Based Error-Bounded Compression for Scientific Data. CoRR abs/2105.11730 (2021). arXiv:2105.11730 https://arxiv.org/abs/2105.11730

[24]

Graphcore Ltd. [n. d.]. IPU Processors. https://www.graphcore.ai/products/ipu

[25]

Ming Lu, Peiyao Guo, Huiqing Shi, Chuntong Cao, and Zhan Ma. 2021. Transformer-based Image Compression. http://arxiv.org/abs/2111.06707 arXiv:2111.06707 [cs, eess].

[26]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.

Digital Library

[27]

SambaNova Systems. [n. d.]. SambaNova Systems DataScale® | Our Products. https://sambanova.ai/products/datascale

[28]

Jeyan Thiyagalingam, Juri Papay, Kuangdai Leng, Samuel Jackson, Mallikarjun Shankar, Geoffrey Fox, and Tony Hey. 2021. SciML-Bench: A Benchmarking Suite for AI for Science. https://github.com/stfc-sciml/sciml-bench

[29]

Jiannan Tian, Sheng Di, Kai Zhao, Cody Rivera, Megan Hickman Fulp, Robert Underwood, Sian Jin, Xin Liang, Jon Calhoun, Dingwen Tao, and Franck Cappello. 2020. cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT '20). Association for Computing Machinery, New York, NY, USA, 3--15.

Digital Library

[30]

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. 2022. OPT: Open Pre-trained Transformer Language Models. arXiv:2205.01068 [cs].

[31]

Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 1643--1654.

[32]

Kai Zhao, Sheng Di, Xin Liang, Sihuan Li, Dingwen Tao, Zizhong Chen, and Franck Cappello. 2020. Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (Stockholm, Sweden) (HPDC '20). Association for Computing Machinery, New York, NY, USA, 89--100.

Digital Library

[33]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large Language Models. http://arxiv.org/abs/2303.18223 arXiv:2303.18223 [cs].

[34]

Maxim Zvyagin, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, J. Gregory Pauloski, Logan Ward, Valerie Hayot-Sasson, Murali Emani, Sam Foreman, Zhen Xie, Diangen Lin, Maulik Shukla, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, and Arvind Ramanathan. 2022. GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics. bioRxiv: The Preprint Server for Biology (Nov. 2022), 2022.10.10.511571.

Index Terms

A Portable, Fast, DCT-based Compressor for AI Accelerators
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Data flow architectures
      2. Neural networks
2. Theory of computation
  1. Design and analysis of algorithms
    1. Data structures design and analysis
      1. Data compression

Recommendations

Design of a Distributed Compressor for Astronomy SSD
FCCM '15: Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines

SSD (solid state device) has shown a great potential in astronomy data storage. Data compression is an essential task to obtain higher storage density and bandwidth. This paper proposes a distributed compressor customized for FPGA-based astronomy SSD. ...
Low Bit Rate Video Coding Using DCT-Based Fast Decimation/Interpolation and Embedded Zerotree Coding

In this paper, we propose a low bit rate video coding procedure in the discrete cosine transform (DCT) domain that is based in an embedded zerotree algorithm and uses decimation and interpolation. Theory for decimation/interpolation in the DCT domain is ...
A Parallel Adaptive Range Coding Compressor: Algorithm, FPGA Prototype, Evaluation
DCC '12: Proceedings of the 2012 Data Compression Conference

Loss less compression algorithms are employed in a wide variety of communication- and storage-related systems. Many embedded applications, such as real-time communication log compression used in automotive systems, impose strict throughput constraints ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 2024

436 pages

ISBN:9798400704130

DOI:10.1145/3625549

Chair:
Patrizio Dazzi,
Co-chair:
Gabriele Mencagli,
Program Chair:
David Lowenthal,
Program Co-chair:
Rosa M Badia

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ACM acknowledges that this contribution was authored or co-authored by an employee, contractor, or affiliate of the United States government. As such, the United States government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for government purposes only.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HPDC '24

Sponsor:

SIGARCH

HPDC '24: 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa, Italy

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
231
Total Downloads

Downloads (Last 12 months)231
Downloads (Last 6 weeks)65

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten