skip to main content
10.1145/3625549.3658662acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Open access

A Portable, Fast, DCT-based Compressor for AI Accelerators

Published: 30 August 2024 Publication History

Abstract

Lossy compression can be an effective tool in AI training and inference to reduce memory requirements, storage footprint, and in some cases, execution time. With the rise of novel architectures designed to accelerate AI workloads, compression can continue to serve these purposes, but must be adapted to the new accelerators. Due to programmability and architectural differences, existing lossy compressors cannot be directly ported to and are not optimized for any AI accelerator, thus requiring new compression designs.
In this paper, we propose a novel, portable, DCT-based lossy compressor that can be used across a variety of AI accelerators. More specifically, we make the following contributions: 1) We propose a DCT-based lossy compressor design for training data that uses operators supported across four state-of-the-art AI accelerators: Cerebras CS-2, SambaNova SN30, Groq GroqChip, and Graphcore IPU. 2) We design two optimization techniques to allow for higher resolution compressed data on certain platforms and improved compression ratio on the IPU. 3) We evaluate our compressor's ability to preserve accuracy on four benchmarks, three of which are AI for science benchmarks going beyond image classification. Our experiments show that accuracy degradation can be limited to 3% or less, and sometimes, compression improves accuracy. 4) We study compression/decompression time as a function of resolution and batch size, finding that our compressor can achieve throughputs on the scale of tens of GB/s, depending on the platform.

References

[1]
[n. d.]. CIFAR-10 and CIFAR-100 datasets. https://www.cs.toronto.edu/~kriz/cifar.html
[2]
[n. d.]. Explore Cerebras Documentation --- Cerebras Developer Documentation. https://docs.cerebras.net/en/latest/
[3]
[n. d.]. Graphcore Documents --- Graphcore Documents. https://docs.graphcore.ai/en/latest/
[4]
[n. d.]. JPEG - JPEG 1. https://jpeg.org/jpeg/index.html
[5]
[n. d.]. Product - System. https://www.cerebras.net/product-system/
[6]
[n. d.]. SambaNova :: SambaNova Documentation. https://docs.sambanova.ai/home/latest/index.html
[7]
2023. GroqCard™ Accelerator - Groq. https://wow.groq.com/groqcard-accelerator/ Section: Blog.
[8]
2023. groq/groqflow. https://github.com/groq/groqflow original-date: 2022-08-08T23:46:56Z.
[9]
Ibrahim Ahmed, Sahil Parmar, Matthew Boyd, Michael Beidler, Kris Kang, Bill Liu, Kyle Roach, John Kim, and Dennis Abts. 2022. Answer Fast: Accelerating BERT on the Tensor Streaming Processor. In 2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP). 80--87. ISSN: 2160-052X.
[10]
N. Ahmed, T. Natarajan, and K.R. Rao. 1974. Discrete Cosine Transform. IEEE Trans. Comput. C-23, 1 (Jan. 1974), 90--93. Conference Name: IEEE Transactions on Computers.
[11]
Dan Alistarh, Demjan Grubic, Jerry Z. Li, Ryota Tomioka, and Milan Vojnovic. 2017. QSGD: communication-efficient SGD via gradient quantization and encoding. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 1707--1718.
[12]
Alexander Brace, Michael Salim, Vishal Subbiah, Heng Ma, Murali Emani, Anda Trifa, Austin R. Clyde, Corey Adams, Thomas Uram, Hyunseung Yoo, Andew Hock, Jessica Liu, Venkatram Vishwanath, and Arvind Ramanathan. 2021. Stream-AI-MD: Streaming AI-Driven Adaptive Molecular Simulations for Heterogeneous Computing Platforms. In Proceedings of the Platform for Advanced Scientific Computing Conference (Geneva, Switzerland) (PASC '21). Association for Computing Machinery, New York, NY, USA, Article 6, 13 pages.
[13]
Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica, Michael Mahoney, and Joseph Gonzalez. 2021. ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 1803--1813. https://proceedings.mlr.press/v139/chen21z.html ISSN: 2640-3498.
[14]
Sheng Di and Franck Cappello. 2016. Fast Error-Bounded Lossy HPC Data Compression with SZ. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 730--739. ISSN: 1530-2075.
[15]
Samuel Dodge and Lina Karam. 2016. Understanding how image quality affects deep neural networks. In 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX). 1--6.
[16]
Murali Emani, Sam Foreman, Varuni Sastry, Zhen Xie, Siddhisanket Raskar, William Arnold, Rajeev Thakur, Venkatram Vishwanath, and Michael E. Papka. 2023. A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators. arXiv:2310.04607 [cs].
[17]
Paul Heckbert. 1982. Color image quantization for frame buffer display. In Proceedings of the 9th Annual Conference on Computer Graphics and Interactive Techniques (Boston, Massachusetts, USA) (SIGGRAPH '82). Association for Computing Machinery, New York, NY, USA, 297--307.
[18]
Yafan Huang, Sheng Di, Xiaodong Yu, Guanpeng Li, and Franck Cappello. 2023. cuSZp: An Ultra-fast GPU Error-bounded Lossy Compression Framework with Optimized End-to-End Performance. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '23). Association for Computing Machinery, New York, NY, USA, 1--13.
[19]
Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Leon Song, and Dingwen Tao. 2021. COMET: a novel memory-efficient deep learning training framework by using error-bounded lossy compression. Proceedings of the VLDB Endowment 15, 4 (Dec. 2021), 886--899.
[20]
Vinu Joseph, Nithin Chalapathi, Aditya Bhaskara, Ganesh Gopalakrishnan, Pavel Panchekha, and Mu Zhang. 2020. Correctness-preserving Compression of Datasets and Neural Network Models. In 2020 IEEE/ACM 4th International Workshop on Software Correctness for HPC Applications (Correctness). 1--9.
[21]
Hyeontaek Lim, David G. Andersen, and Michael Kaminsky. 2018. 3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning. Issue: arXiv:1802.07389 arXiv:1802.07389 [cs, stat].
[22]
Peter Lindstrom. 2014. Fixed-Rate Compressed Floating-Point Arrays. IEEE Transactions on Visualization and Computer Graphics 20, 12 (Dec. 2014), 2674--2683.
[23]
Jinyang Liu, Sheng Di, Kai Zhao, Sian Jin, Dingwen Tao, Xin Liang, Zizhong Chen, and Franck Cappello. 2021. Exploring Autoencoder-Based Error-Bounded Compression for Scientific Data. CoRR abs/2105.11730 (2021). arXiv:2105.11730 https://arxiv.org/abs/2105.11730
[24]
Graphcore Ltd. [n. d.]. IPU Processors. https://www.graphcore.ai/products/ipu
[25]
Ming Lu, Peiyao Guo, Huiqing Shi, Chuntong Cao, and Zhan Ma. 2021. Transformer-based Image Compression. http://arxiv.org/abs/2111.06707 arXiv:2111.06707 [cs, eess].
[26]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.
[27]
SambaNova Systems. [n. d.]. SambaNova Systems DataScale® | Our Products. https://sambanova.ai/products/datascale
[28]
Jeyan Thiyagalingam, Juri Papay, Kuangdai Leng, Samuel Jackson, Mallikarjun Shankar, Geoffrey Fox, and Tony Hey. 2021. SciML-Bench: A Benchmarking Suite for AI for Science. https://github.com/stfc-sciml/sciml-bench
[29]
Jiannan Tian, Sheng Di, Kai Zhao, Cody Rivera, Megan Hickman Fulp, Robert Underwood, Sian Jin, Xin Liang, Jon Calhoun, Dingwen Tao, and Franck Cappello. 2020. cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT '20). Association for Computing Machinery, New York, NY, USA, 3--15.
[30]
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. 2022. OPT: Open Pre-trained Transformer Language Models. arXiv:2205.01068 [cs].
[31]
Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 1643--1654.
[32]
Kai Zhao, Sheng Di, Xin Liang, Sihuan Li, Dingwen Tao, Zizhong Chen, and Franck Cappello. 2020. Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (Stockholm, Sweden) (HPDC '20). Association for Computing Machinery, New York, NY, USA, 89--100.
[33]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large Language Models. http://arxiv.org/abs/2303.18223 arXiv:2303.18223 [cs].
[34]
Maxim Zvyagin, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, J. Gregory Pauloski, Logan Ward, Valerie Hayot-Sasson, Murali Emani, Sam Foreman, Zhen Xie, Diangen Lin, Maulik Shukla, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, and Arvind Ramanathan. 2022. GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics. bioRxiv: The Preprint Server for Biology (Nov. 2022), 2022.10.10.511571.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing
June 2024
436 pages
ISBN:9798400704130
DOI:10.1145/3625549
ACM acknowledges that this contribution was authored or co-authored by an employee, contractor, or affiliate of the United States government. As such, the United States government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for government purposes only.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2024

Check for updates

Author Tags

  1. compression
  2. AI accelerator
  3. ML training

Qualifiers

  • Research-article

Conference

HPDC '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 231
    Total Downloads
  • Downloads (Last 12 months)231
  • Downloads (Last 6 weeks)65
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media