ABSTRACT
Memory is a scarce resource and increasingly so in the age of deep neural networks. Memory compression is a solution to the memory scarcity problem. This work proposes NNW-BDI, a scheme for compressing pretrained neural network weights. NNW-BDI is a variation to standard Base-Delta-Immediate [13] compression technique to make it a better fit for neural network weights, using techniques such as quantization, downscaling, randomized base selection, and base-delta-configuration adjustment. We evaluate our algorithm by compressing the weights of a MNIST classification network. Our evaluation shows that NNW-BDI reduces memory usage by up to 85% percent without any reduction in inference accuracy.
- A. Arunkumar, S. Lee, V. Soundararajan, and C. Wu. 2018. LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 221–234.Google Scholar
- Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John Guttag. 2020. What is the state of neural network pruning?arXiv preprint arXiv:2003.03033(2020).Google Scholar
- Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS’15). MIT Press, Cambridge, MA, USA, 3123–3131.Google Scholar
- R. David Evans, Lufei Liu, and Tor M. Aamodt. 2020. JPEG-ACT: Accelerating Deep Learning via Transform-based Lossy Compression. In Proceedings of the 47th Annual International Symposium on Computer Architecture(ISCA 47). https://doi.org/10.1109/ISCA45697.2020.00075Google ScholarDigital Library
- Y. Fang, P. Chou, B. Chen, T. Lin, and J. Wang. 2017. An all-n-type dynamic adder for ultra-low-leakage IoT devices. In 2017 IEEE 12th International Conference on ASIC (ASICON). 68–71.Google Scholar
- M. Gautschi, M. Schaffner, F. K. Gürkaynak, and L. Benini. 2016. 4.6 A 65nm CMOS 6.4-to-29.2pJ/[email protected] shared logarithmic floating point unit for acceleration of nonlinear function kernels in a tightly coupled processor cluster. In 2016 IEEE International Solid-State Circuits Conference (ISSCC). 82–83.Google Scholar
- Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep Learning with Limited Numerical Precision. CoRR abs/1502.02551(2015). arxiv:1502.02551http://arxiv.org/abs/1502.02551Google Scholar
- M. Hemmat, T. Shah, Y. Chen, and J. S. Miguel. 2020. CRANIA: Unlocking Data and Value Reuse in Iterative Neural Network Architectures. In 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC). 295–300.Google Scholar
- Seokin Hong, Prashant J. Nair, Bulent Abali, Alper Buyuktosunoglu, Kyu-Hyoun Kim, and Michael B. Healy. 2018. Attaché: Towards Ideal Memory Compression by Mitigating Metadata Bandwidth Overheads. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (Fukuoka, Japan) (MICRO-51). IEEE Press, 326–338. https://doi.org/10.1109/MICRO.2018.00034Google ScholarDigital Library
- A. Jain, A. Phanishayee, J. Mars, L. Tang, and G. Pekhimenko. 2018. Gist: Efficient Data Encoding for Deep Neural Network Training. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 776–789.Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdfGoogle ScholarDigital Library
- Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. Linearly Compressed Pages: A Low-Complexity, Low-Latency Main Memory Compression Framework. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (Davis, California) (MICRO-46). Association for Computing Machinery, New York, NY, USA, 172–184. https://doi.org/10.1145/2540708.2540724Google ScholarDigital Library
- G. Pekhimenko, V. Seshadri, O. Mutlu, M. A. Kozuch, P. B. Gibbons, and T. C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). 377–388.Google ScholarDigital Library
- M. Schaffner, M. Gautschi, F. K. Gürkaynak, and L. Benini. 2016. Accuracy and Performance Trade-Offs of Logarithmic Number Units in Multi-Core Clusters. In 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH). 95–103.Google Scholar
- Wonyong Sung, Sungho Shin, and Kyuyeon Hwang. 2015. Resiliency of Deep Neural Networks under Quantization. ArXiv abs/1511.06488(2015).Google Scholar
- G. K. Wallace. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv.Google ScholarDigital Library
- V. Young, S. Kariyappa, and M. K. Qureshi. 2019. Enabling Transparent Memory-Compression for Commodity Memory Systems. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 570–581.Google Scholar
Recommendations
Adaptive weight compression for memory-efficient neural networks
DATE '17: Proceedings of the Conference on Design, Automation & Test in EuropeNeural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents an application of JPEG image encoding to compress the weights by exploiting the spatial locality and ...
Haar image compression using a neural network
ACC'08: Proceedings of the WSEAS International Conference on Applied Computing ConferenceWavelet Transform is one of the most popular methods applied in image compression. Wavelet-based image compression provides substantial improvements in picture quality at higher compression ratios. Haar wavelet transform based compression is one of the ...
Neural Markovian Predictive Compression: An Algorithm for Online Lossless Data Compression
DCC '10: Proceedings of the 2010 Data Compression ConferenceThis work proposes a novel practical and general-purpose lossless compression algorithm named Neural Markovian Predictive Compression (NMPC), based on a novel combination of Bayesian Neural Networks (BNNs) and Hidden Markov Models (HMM). The result is ...
Comments