short-paper

Neural Network Weight Compression with NNW-BDI

Authors:
Andrei Bersatti

Georgia Institute of Technology, United States

Georgia Institute of Technology, United States
View Profile

,
Nima Shoghi Ghalehshahi

Georgia Institute of Technology, United States

Georgia Institute of Technology, United States
View Profile

,
Hyesoon Kim

Georgia Institute of Technology, United States

Georgia Institute of Technology, United States
View Profile

MEMSYS '20: Proceedings of the International Symposium on Memory SystemsSeptember 2020Pages 335–340https://doi.org/10.1145/3422575.3422805

Published:21 March 2021Publication History

MEMSYS '20: Proceedings of the International Symposium on Memory Systems

Pages 335–340

ABSTRACT

Memory is a scarce resource and increasingly so in the age of deep neural networks. Memory compression is a solution to the memory scarcity problem. This work proposes NNW-BDI, a scheme for compressing pretrained neural network weights. NNW-BDI is a variation to standard Base-Delta-Immediate [13] compression technique to make it a better fit for neural network weights, using techniques such as quantization, downscaling, randomized base selection, and base-delta-configuration adjustment. We evaluate our algorithm by compressing the weights of a MNIST classification network. Our evaluation shows that NNW-BDI reduces memory usage by up to 85% percent without any reduction in inference accuracy.

References

A. Arunkumar, S. Lee, V. Soundararajan, and C. Wu. 2018. LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 221–234.Google Scholar
Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John Guttag. 2020. What is the state of neural network pruning?arXiv preprint arXiv:2003.03033(2020).Google Scholar
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS’15). MIT Press, Cambridge, MA, USA, 3123–3131.Google Scholar
R. David Evans, Lufei Liu, and Tor M. Aamodt. 2020. JPEG-ACT: Accelerating Deep Learning via Transform-based Lossy Compression. In Proceedings of the 47th Annual International Symposium on Computer Architecture(ISCA 47). https://doi.org/10.1109/ISCA45697.2020.00075Google ScholarDigital Library
Y. Fang, P. Chou, B. Chen, T. Lin, and J. Wang. 2017. An all-n-type dynamic adder for ultra-low-leakage IoT devices. In 2017 IEEE 12th International Conference on ASIC (ASICON). 68–71.Google Scholar
M. Gautschi, M. Schaffner, F. K. Gürkaynak, and L. Benini. 2016. 4.6 A 65nm CMOS 6.4-to-29.2pJ/[email protected] shared logarithmic floating point unit for acceleration of nonlinear function kernels in a tightly coupled processor cluster. In 2016 IEEE International Solid-State Circuits Conference (ISSCC). 82–83.Google Scholar
Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep Learning with Limited Numerical Precision. CoRR abs/1502.02551(2015). arxiv:1502.02551http://arxiv.org/abs/1502.02551Google Scholar
M. Hemmat, T. Shah, Y. Chen, and J. S. Miguel. 2020. CRANIA: Unlocking Data and Value Reuse in Iterative Neural Network Architectures. In 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC). 295–300.Google Scholar
Seokin Hong, Prashant J. Nair, Bulent Abali, Alper Buyuktosunoglu, Kyu-Hyoun Kim, and Michael B. Healy. 2018. Attaché: Towards Ideal Memory Compression by Mitigating Metadata Bandwidth Overheads. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (Fukuoka, Japan) (MICRO-51). IEEE Press, 326–338. https://doi.org/10.1109/MICRO.2018.00034Google ScholarDigital Library
A. Jain, A. Phanishayee, J. Mars, L. Tang, and G. Pekhimenko. 2018. Gist: Efficient Data Encoding for Deep Neural Network Training. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 776–789.Google Scholar
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdfGoogle ScholarDigital Library
Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. Linearly Compressed Pages: A Low-Complexity, Low-Latency Main Memory Compression Framework. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (Davis, California) (MICRO-46). Association for Computing Machinery, New York, NY, USA, 172–184. https://doi.org/10.1145/2540708.2540724Google ScholarDigital Library
G. Pekhimenko, V. Seshadri, O. Mutlu, M. A. Kozuch, P. B. Gibbons, and T. C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). 377–388.Google ScholarDigital Library
M. Schaffner, M. Gautschi, F. K. Gürkaynak, and L. Benini. 2016. Accuracy and Performance Trade-Offs of Logarithmic Number Units in Multi-Core Clusters. In 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH). 95–103.Google Scholar
Wonyong Sung, Sungho Shin, and Kyuyeon Hwang. 2015. Resiliency of Deep Neural Networks under Quantization. ArXiv abs/1511.06488(2015).Google Scholar
G. K. Wallace. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv.Google ScholarDigital Library
V. Young, S. Kariyappa, and M. K. Qureshi. 2019. Enabling Transparent Memory-Compression for Commodity Memory Systems. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 570–581.Google Scholar

Recommendations

Adaptive weight compression for memory-efficient neural networks
DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

Neural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents an application of JPEG image encoding to compress the weights by exploiting the spatial locality and ...
Read More
Haar image compression using a neural network
ACC'08: Proceedings of the WSEAS International Conference on Applied Computing Conference

Wavelet Transform is one of the most popular methods applied in image compression. Wavelet-based image compression provides substantial improvements in picture quality at higher compression ratios. Haar wavelet transform based compression is one of the ...
Read More
Neural Markovian Predictive Compression: An Algorithm for Online Lossless Data Compression
DCC '10: Proceedings of the 2010 Data Compression Conference

This work proposes a novel practical and general-purpose lossless compression algorithm named Neural Markovian Predictive Compression (NMPC), based on a novel combination of Bayesian Neural Networks (BNNs) and Hidden Markov Models (HMM). The result is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MEMSYS '20: Proceedings of the International Symposium on Memory Systems
September 2020
362 pages
ISBN:9781450388993
DOI:10.1145/3422575

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 March 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
compression
memory
neural networks
Qualifiers
- short-paper
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 118
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Neural Network Weight Compression with NNW-BDI

MEMSYS '20: Proceedings of the International Symposium on Memory Systems

ABSTRACT

References

Cited By

Recommendations

Adaptive weight compression for memory-efficient neural networks

Haar image compression using a neural network

Neural Markovian Predictive Compression: An Algorithm for Online Lossless Data Compression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Neural Network Weight Compression with NNW-BDI

MEMSYS '20: Proceedings of the International Symposium on Memory Systems

ABSTRACT

References

Cited By

Recommendations

Adaptive weight compression for memory-efficient neural networks

Haar image compression using a neural network

Neural Markovian Predictive Compression: An Algorithm for Online Lossless Data Compression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media