Abstract
In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique.




















Similar content being viewed by others

Availability of data and materials
Not applicable.
Code availability
Not applicable.
Change history
07 July 2023
A Correction to this paper has been published: https://doi.org/10.1007/s11227-023-05514-7
References
Yin S, Ouyang P, Tang S, Tu F, Li X, Liu L, Wei S (2017) A 1.06-to-5.09 tops/w reconfigurable hybrid-neural-network processor for deep learning applications. In: Symposium on VLSI Circuits, June p C26CC27
Lee J, Kim C, Kang S, Shin D, Kim S, Yoo H-J (2018) Unpu: a 50.6tops/w unified deep neural network accelerator with 1b-to-16b fullyvariable weight bit-precision. In: 2018 IEEE International Solid-State Circuits Conference, Feb, p 218C220
Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017) 14.5 envision: a 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy frequency-scalable convolutional neural network processor in 28nm fdsoi. In: IEEE International Solid-State Circuits Conference, Feb, p 246C247
Shao YS, Clemons J, Venkatesan R, Zimmer B, Fojtik M, Jiang N, Keller B, Klinefelter A, Pinckney N, Raina P, Tell SG, Zhang Y, Dally WJ, Emer J, Gray CT, Khailany B, Keckler SW (2019) Simba: scaling deep-learning inference with multi-chip-module-based architecture. In: IEEE/ACM International Symposium on Microarchitecture, p 14C27
Chen Y-H, Yang T-J, Emer J, Sze V (2019) Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J Emerg Sel Top Circuits Syst 9(2):292C308
Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) ShiDianNao: shifting vision processing closer to the sensor. In: ISCA
Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2016) Improving the energy efficiency of wireless network on chip architectures through online selective buffers and receivers shutdown. In: 2016 13th IEEE Annual Consumer Communications Networking Conference (CCNC), p 668C673
Cavigelli L, Rutishauser G, Benini L (2019) Ebpc: extended bitplane compression for deep neural network inference and training accelerators. IEEE J Emerg Sel Top Circuits Syst 99:1–13
Lahdhiri H, Palesi M, Monteleone S, Patti D, Ascia G, Lorandel J, Bourdel E, Catania V (2020) DNNZip: selective layers compression technique in deep neural network accelerators. In: 2020 23rd Euromicro Conference on Digital System Design (DSD), pp 526–533
Ascia G, Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2020) Improving inference latency and energy of DNNS through wireless enabled multi-chip-module-based architectures and model parameters compression. In: 2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS)
LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in Neural Information Processing Systems, pp 598–605
Dong X, Chen S, and Pan S (2017) Learning to prune deep neural networks via layer-wise optimal brain surgeon. In: Advances in Neural Information Processing Systems, pp 4857–4867
Zhang T, Ye S, Zhang K et al (2018) A systematic DNN weight pruning framework using alternating direction method of multipliers. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 184–199
Liu Z, Xu J, Peng X et al (2018) Frequency-domain dynamic pruning for convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1043–1053
Courbariaux M, Bengio Y, David JP (2015) Binary connect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp 3123–3131
Rastegari M, Ordonez V, Redmon J, et al (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: Proceedings of the European Conference on Computer Vision. Springer, Cham, pp 525–D542
Liu Z, Wu B, Luo W, et al (2018) Bi-real net: enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 722–737
Gong Y, Liu L, Yang M, et al (2014) Compressing deep convolutional networks using vector quantization, arXiv Preprint arXiv: 1412.6115
Xu Y, Wang Y, Zhou A et al (2018) Deep neural network compression with single and multiple level quantization. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence
Yang Z, Moczulski M, Denil M et al (2015) Deep fried convnets. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1476–1483
Chen W, Wilson J, Tyree S, et al (2015) Compressing neural networks with the hashing trick. In: Proceedings of the International Conference on Machine Learning, pp 2285–2294
Iandola FN, Han S, Moskewicz MW et al (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5MB model size, arXiv Preprint arXiv: 1602.07360
Zhang J, Franchetti F, Low TM (2018) High performance zero-memory overhead direct convolutions, arXiv Preprint arXiv:1809.10170,
Kim E, Ahn C, Oh S (2018) Nestednet: learning nested sparse structures in deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:8669.8678
Bucilu C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 535–541
Zhu X, Gong S (2018) Knowledge distillation by on-the-fly native ensemble, In: Advances in Neural Information Processing Systems, pp 7517–7527
Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems, pp 2654–2662
Romero A, Ballas N, Kahou SE et al (2014) Fitnets: Hints for thin deep nets, arXiv Preprint arXiv:1412.6550
Van Leeuwen J (1976) On the construction of Huffman trees. In: ICALP, p 382C410
Russo et al.(2020) LAMBDA: An Open Framework for Deep Neural Network Accelerators Simulation
Parashar et al (2019) Timeloop: A Systematic Approach to DNN Accelerator Evaluation
Wu et al. (2019) Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs
Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2016) Cycle accurate network on chip simulation with noxim. ACM Trans Model Comput Simul 27(1):4:1C4:25
Balasubramonian R, Kahng AB, Muralimanohar N, Shafiee A, Srinivas A (2017) Cacti 7: new tools for interconnect exploration in innovative off-chip memories. ACM Trans Archit Code Optim (TACO) 14(2):1C25
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China under Grants No. 61902081.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
MT wrote the paper. ER carried out the experiments. MP reviewed the paper.
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: the author name Minghua Tang was incorrectly written as Minging Tang
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, M., Russo, E. & Palesi, M. The position-based compression techniques for DNN model. J Supercomput 79, 17445–17474 (2023). https://doi.org/10.1007/s11227-023-05339-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05339-4