Skip to main content
Log in

The position-based compression techniques for DNN model

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

A Correction to this article was published on 07 July 2023

This article has been updated

Abstract

In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Availability of data and materials

Not applicable.

Code availability

Not applicable.

Change history

References

  1. Yin S, Ouyang P, Tang S, Tu F, Li X, Liu L, Wei S (2017) A 1.06-to-5.09 tops/w reconfigurable hybrid-neural-network processor for deep learning applications. In: Symposium on VLSI Circuits, June p C26CC27

  2. Lee J, Kim C, Kang S, Shin D, Kim S, Yoo H-J (2018) Unpu: a 50.6tops/w unified deep neural network accelerator with 1b-to-16b fullyvariable weight bit-precision. In: 2018 IEEE International Solid-State Circuits Conference, Feb, p 218C220

  3. Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017) 14.5 envision: a 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy frequency-scalable convolutional neural network processor in 28nm fdsoi. In: IEEE International Solid-State Circuits Conference, Feb, p 246C247

  4. Shao YS, Clemons J, Venkatesan R, Zimmer B, Fojtik M, Jiang N, Keller B, Klinefelter A, Pinckney N, Raina P, Tell SG, Zhang Y, Dally WJ, Emer J, Gray CT, Khailany B, Keckler SW (2019) Simba: scaling deep-learning inference with multi-chip-module-based architecture. In: IEEE/ACM International Symposium on Microarchitecture, p 14C27

  5. Chen Y-H, Yang T-J, Emer J, Sze V (2019) Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J Emerg Sel Top Circuits Syst 9(2):292C308

    Google Scholar 

  6. Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) ShiDianNao: shifting vision processing closer to the sensor. In: ISCA

  7. Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2016) Improving the energy efficiency of wireless network on chip architectures through online selective buffers and receivers shutdown. In: 2016 13th IEEE Annual Consumer Communications Networking Conference (CCNC), p 668C673

  8. Cavigelli L, Rutishauser G, Benini L (2019) Ebpc: extended bitplane compression for deep neural network inference and training accelerators. IEEE J Emerg Sel Top Circuits Syst 99:1–13

    Google Scholar 

  9. Lahdhiri H, Palesi M, Monteleone S, Patti D, Ascia G, Lorandel J, Bourdel E, Catania V (2020) DNNZip: selective layers compression technique in deep neural network accelerators. In: 2020 23rd Euromicro Conference on Digital System Design (DSD), pp 526–533

  10. Ascia G, Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2020) Improving inference latency and energy of DNNS through wireless enabled multi-chip-module-based architectures and model parameters compression. In: 2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS)

  11. LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in Neural Information Processing Systems, pp 598–605

  12. Dong X, Chen S, and Pan S (2017) Learning to prune deep neural networks via layer-wise optimal brain surgeon. In: Advances in Neural Information Processing Systems, pp 4857–4867

  13. Zhang T, Ye S, Zhang K et al (2018) A systematic DNN weight pruning framework using alternating direction method of multipliers. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 184–199

  14. Liu Z, Xu J, Peng X et al (2018) Frequency-domain dynamic pruning for convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1043–1053

  15. Courbariaux M, Bengio Y, David JP (2015) Binary connect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp 3123–3131

  16. Rastegari M, Ordonez V, Redmon J, et al (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: Proceedings of the European Conference on Computer Vision. Springer, Cham, pp 525–D542

  17. Liu Z, Wu B, Luo W, et al (2018) Bi-real net: enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 722–737

  18. Gong Y, Liu L, Yang M, et al (2014) Compressing deep convolutional networks using vector quantization, arXiv Preprint arXiv: 1412.6115

  19. Xu Y, Wang Y, Zhou A et al (2018) Deep neural network compression with single and multiple level quantization. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence

  20. Yang Z, Moczulski M, Denil M et al (2015) Deep fried convnets. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1476–1483

  21. Chen W, Wilson J, Tyree S, et al (2015) Compressing neural networks with the hashing trick. In: Proceedings of the International Conference on Machine Learning, pp 2285–2294

  22. Iandola FN, Han S, Moskewicz MW et al (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5MB model size, arXiv Preprint arXiv: 1602.07360

  23. Zhang J, Franchetti F, Low TM (2018) High performance zero-memory overhead direct convolutions, arXiv Preprint arXiv:1809.10170,

  24. Kim E, Ahn C, Oh S (2018) Nestednet: learning nested sparse structures in deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:8669.8678

  25. Bucilu C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 535–541

  26. Zhu X, Gong S (2018) Knowledge distillation by on-the-fly native ensemble, In: Advances in Neural Information Processing Systems, pp 7517–7527

  27. Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems, pp 2654–2662

  28. Romero A, Ballas N, Kahou SE et al (2014) Fitnets: Hints for thin deep nets, arXiv Preprint arXiv:1412.6550

  29. Van Leeuwen J (1976) On the construction of Huffman trees. In: ICALP, p 382C410

  30. Russo et al.(2020) LAMBDA: An Open Framework for Deep Neural Network Accelerators Simulation

  31. Parashar et al (2019) Timeloop: A Systematic Approach to DNN Accelerator Evaluation

  32. Wu et al. (2019) Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs

  33. Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2016) Cycle accurate network on chip simulation with noxim. ACM Trans Model Comput Simul 27(1):4:1C4:25

    Google Scholar 

  34. Balasubramonian R, Kahng AB, Muralimanohar N, Shafiee A, Srinivas A (2017) Cacti 7: new tools for interconnect exploration in innovative off-chip memories. ACM Trans Archit Code Optim (TACO) 14(2):1C25

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grants No. 61902081.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

MT wrote the paper. ER carried out the experiments. MP reviewed the paper.

Corresponding author

Correspondence to Minghua Tang.

Ethics declarations

Conflict of interest

Not applicable.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: the author name Minghua Tang was incorrectly written as Minging Tang

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, M., Russo, E. & Palesi, M. The position-based compression techniques for DNN model. J Supercomput 79, 17445–17474 (2023). https://doi.org/10.1007/s11227-023-05339-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05339-4

Keywords