The position-based compression techniques for DNN model

Tang, Minghua; Russo, Enrico; Palesi, Maurizio

doi:10.1007/s11227-023-05339-4

The position-based compression techniques for DNN model

Published: 08 May 2023

Volume 79, pages 17445–17474, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Minghua Tang¹,
Enrico Russo²^na1 &
Maurizio Palesi²^na1

199 Accesses
1 Citation
Explore all metrics

A Correction to this article was published on 07 July 2023

This article has been updated

Abstract

In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyzing the Impact of DNN Hardware Accelerators-Oriented Compression Techniques on General-Purpose Low-End Boards

Fragmented Huffman-Based Compression Methodology for CNN Targeting Resource-Constrained Edge Devices

Article 07 February 2022

How Can Deep Neural Networks Be Generated Efficiently for Devices with Limited Resources?

Availability of data and materials

Not applicable.

Code availability

Not applicable.

Change history

07 July 2023
A Correction to this paper has been published: https://doi.org/10.1007/s11227-023-05514-7

References

Yin S, Ouyang P, Tang S, Tu F, Li X, Liu L, Wei S (2017) A 1.06-to-5.09 tops/w reconfigurable hybrid-neural-network processor for deep learning applications. In: Symposium on VLSI Circuits, June p C26CC27
Lee J, Kim C, Kang S, Shin D, Kim S, Yoo H-J (2018) Unpu: a 50.6tops/w unified deep neural network accelerator with 1b-to-16b fullyvariable weight bit-precision. In: 2018 IEEE International Solid-State Circuits Conference, Feb, p 218C220
Moons B, Uytterhoeven R, Dehaene W, Verhelst M (2017) 14.5 envision: a 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy frequency-scalable convolutional neural network processor in 28nm fdsoi. In: IEEE International Solid-State Circuits Conference, Feb, p 246C247
Shao YS, Clemons J, Venkatesan R, Zimmer B, Fojtik M, Jiang N, Keller B, Klinefelter A, Pinckney N, Raina P, Tell SG, Zhang Y, Dally WJ, Emer J, Gray CT, Khailany B, Keckler SW (2019) Simba: scaling deep-learning inference with multi-chip-module-based architecture. In: IEEE/ACM International Symposium on Microarchitecture, p 14C27
Chen Y-H, Yang T-J, Emer J, Sze V (2019) Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J Emerg Sel Top Circuits Syst 9(2):292C308
Google Scholar
Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) ShiDianNao: shifting vision processing closer to the sensor. In: ISCA
Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2016) Improving the energy efficiency of wireless network on chip architectures through online selective buffers and receivers shutdown. In: 2016 13th IEEE Annual Consumer Communications Networking Conference (CCNC), p 668C673
Cavigelli L, Rutishauser G, Benini L (2019) Ebpc: extended bitplane compression for deep neural network inference and training accelerators. IEEE J Emerg Sel Top Circuits Syst 99:1–13
Google Scholar
Lahdhiri H, Palesi M, Monteleone S, Patti D, Ascia G, Lorandel J, Bourdel E, Catania V (2020) DNNZip: selective layers compression technique in deep neural network accelerators. In: 2020 23rd Euromicro Conference on Digital System Design (DSD), pp 526–533
Ascia G, Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2020) Improving inference latency and energy of DNNS through wireless enabled multi-chip-module-based architectures and model parameters compression. In: 2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS)
LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in Neural Information Processing Systems, pp 598–605
Dong X, Chen S, and Pan S (2017) Learning to prune deep neural networks via layer-wise optimal brain surgeon. In: Advances in Neural Information Processing Systems, pp 4857–4867
Zhang T, Ye S, Zhang K et al (2018) A systematic DNN weight pruning framework using alternating direction method of multipliers. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 184–199
Liu Z, Xu J, Peng X et al (2018) Frequency-domain dynamic pruning for convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1043–1053
Courbariaux M, Bengio Y, David JP (2015) Binary connect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp 3123–3131
Rastegari M, Ordonez V, Redmon J, et al (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: Proceedings of the European Conference on Computer Vision. Springer, Cham, pp 525–D542
Liu Z, Wu B, Luo W, et al (2018) Bi-real net: enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 722–737
Gong Y, Liu L, Yang M, et al (2014) Compressing deep convolutional networks using vector quantization, arXiv Preprint arXiv: 1412.6115
Xu Y, Wang Y, Zhou A et al (2018) Deep neural network compression with single and multiple level quantization. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence
Yang Z, Moczulski M, Denil M et al (2015) Deep fried convnets. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1476–1483
Chen W, Wilson J, Tyree S, et al (2015) Compressing neural networks with the hashing trick. In: Proceedings of the International Conference on Machine Learning, pp 2285–2294
Iandola FN, Han S, Moskewicz MW et al (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$0.5MB model size, arXiv Preprint arXiv: 1602.07360
Zhang J, Franchetti F, Low TM (2018) High performance zero-memory overhead direct convolutions, arXiv Preprint arXiv:1809.10170,
Kim E, Ahn C, Oh S (2018) Nestednet: learning nested sparse structures in deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:8669.8678
Bucilu C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 535–541
Zhu X, Gong S (2018) Knowledge distillation by on-the-fly native ensemble, In: Advances in Neural Information Processing Systems, pp 7517–7527
Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems, pp 2654–2662
Romero A, Ballas N, Kahou SE et al (2014) Fitnets: Hints for thin deep nets, arXiv Preprint arXiv:1412.6550
Van Leeuwen J (1976) On the construction of Huffman trees. In: ICALP, p 382C410
Russo et al.(2020) LAMBDA: An Open Framework for Deep Neural Network Accelerators Simulation
Parashar et al (2019) Timeloop: A Systematic Approach to DNN Accelerator Evaluation
Wu et al. (2019) Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs
Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2016) Cycle accurate network on chip simulation with noxim. ACM Trans Model Comput Simul 27(1):4:1C4:25
Google Scholar
Balasubramonian R, Kahng AB, Muralimanohar N, Shafiee A, Srinivas A (2017) Cacti 7: new tools for interconnect exploration in innovative off-chip memories. ACM Trans Archit Code Optim (TACO) 14(2):1C25
Google Scholar

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grants No. 61902081.

Funding

Not applicable.

Author information

Enrico Russo and Maurizio Palesi contributed equally to this work.

Authors and Affiliations

School of Internet Finance and Information Engineering, GuangDong University of Finance, Yingfu Road 527, Guangzhou, 510521, Guangdong, China
Minghua Tang
Electrical, Electronic, and Computer Engineering (DIEEI), University of Catania, 95125, Catania, Italy
Enrico Russo & Maurizio Palesi

Authors

Minghua Tang
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Russo
View author publications
You can also search for this author in PubMed Google Scholar
Maurizio Palesi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MT wrote the paper. ER carried out the experiments. MP reviewed the paper.

Corresponding author

Correspondence to Minghua Tang.

Ethics declarations

Conflict of interest

Not applicable.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: the author name Minghua Tang was incorrectly written as Minging Tang

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tang, M., Russo, E. & Palesi, M. The position-based compression techniques for DNN model. J Supercomput 79, 17445–17474 (2023). https://doi.org/10.1007/s11227-023-05339-4

Download citation

Accepted: 24 April 2023
Published: 08 May 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11227-023-05339-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The position-based compression techniques for DNN model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing the Impact of DNN Hardware Accelerators-Oriented Compression Techniques on General-Purpose Low-End Boards

Fragmented Huffman-Based Compression Methodology for CNN Targeting Resource-Constrained Edge Devices

How Can Deep Neural Networks Be Generated Efficiently for Devices with Limited Resources?

Availability of data and materials

Code availability

Change history

07 July 2023

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now