Skip to main content
Log in

HFPQ: deep neural network compression by hardware-friendly pruning-quantization

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper presents a hardware-friendly compression method for deep neural networks. This method effectively combines layered channel pruning with quantization by a power exponential of 2. While keeping a small decrease in the accuracy of the network model, the computational resources for neural networks to be deployed on the hardware are greatly reduced. These computing resources for hardware resolution include memory, multiple accumulation cells (MACs), and many logic gates for neural networks. Layered channel pruning groups the different layers by decreasing the model accuracy of the pruned network. After pruning each layer in a specific order, the network is retrained. The pruning method in this paper sets a parameter, that can be adjusted to meet different pruning rates in practical applications. The quantization method converts high-precision weights to low-precision weights. The latter are all composed of 0 and powers of 2. In the same way, another parameter is set to control the quantized bit width, which can also be adjusted to meet different quantization precisions. The hardware-friendly pruning quantization (HFPQ) method proposed in this paper trains the network after pruning and then quantizes the weights. The experimental results show that the HFPQ method compresses VGGNet, ResNet and GoogLeNet by 30+ times while reducing the number of FLOPs by more than 85%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE conference on computer vision and pattern recognition. IEEE , p 2012

  2. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  3. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  4. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  5. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  6. Redmon J, Divvala S, Girshick R, Farhadi v (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  7. Taigman Y, Yang M, Ranzato M, Wolf L (2014) Closing the gap to human-level performance in face verification. Deepface. In: IEEE computer vision and pattern recognition (CVPR), vol 5, p 6

  8. Gecer B, Ploumpis S, Kotsia I, Zafeiriou S (2019) Ganfit: generative adversarial network fitting for high fidelity 3d face reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1155–1164

  9. Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4690–4699

  10. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122

  11. Yu R, Li A, Morariu VI, Davis LS (2017) Visual relationship detection with internal and external linguistic knowledge distillation. In: Proceedings of the IEEE international conference on computer vision, pp 1974–1982

  12. He T, Shen C, Tian Z, Gong D, Sun C, Yan Y (2019) Knowledge adaptation for efficient semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 578–587

  13. Hu R, Andreas J, Rohrbach M, Darrell T, Saenko K (2017) Learning to reason: end-to-end module networks for visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 804–813

  14. Shrestha R, Kafle K, Kanan C (2019) Answer them all! Toward universal visual question answering models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10472–10481

  15. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  17. Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient dnns. In: Advances in neural information processing systems, pp 1379–1387

  18. Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149

  19. Tung F, Mori G (2018) Clip-q: deep network compression learning by in-parallel pruning-quantization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7873–7882

  20. Zhou A, Yao A, Guo Y, Xu L, Chen Y (2017) Incremental network quantization: towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044

  21. Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135–1143

  22. Liu Z, Xu J, Peng X, Xiong R (2018) Frequency-domain dynamic pruning for convolutional neural networks. In: Advances in neural information processing systems, pp 1043–1053

  23. Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710

  24. LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598– 605

  25. Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning. In: IEEE international conference on neural networks. IEEE, pp 293–299

  26. Polyak A, Wolf L (2015) Channel-level acceleration of deep face representations. IEEE Access 3:2163–2175

    Article  Google Scholar 

  27. Luo J-H, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE international conference on computer vision, pp 5058–5066

  28. Lin S, Ji R, Yan C, Zhang B, Cao L, Ye Q, Huang F, Doermann D (2019) Towards optimal structured cnn pruning via generative adversarial learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2790–2799

  29. Park E, Ahn J, Yoo S (2017) Weighted-entropy-based quantization for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5456–5464

  30. Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160

  31. Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems, pp 3123–3131

  32. Li F, Zhang B, Liu B (2016) Ternary weight networks. arXiv preprint arXiv:1605.04711

  33. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542

  34. Wang P, Cheng J (2017) Fixed-point factorized networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4012–4020

  35. Karnin ED (1990) A simple procedure for pruning back-propagation trained neural networks. IEEE Trans Neural Netw 1(2):239–242

    Article  Google Scholar 

  36. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  37. Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images, In Citeseer

  38. O Russakovsky, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  39. He Y, Liu P, Wang Z, Hu Z, Yang Y (2019) Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4340–4349

  40. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645

  41. Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146

  42. Li J, He L, Ren S, Mao R (2018) Data fine-pruning: a simple way to accelerate neural network training. In: Zhang F, Zhai J, Snir M, Jin H, Kasahara H, Valero M (eds) Network and parallel computing. Springer International Publishing, Cham, pp 114–125

  43. Pöllot M, Zhang R, Kaup A (2020) An efficient alternative to network pruning through ensemble learning. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4027–4031

  44. He Y, Kang G, Dong X, Fu Y, Yang Y (2018) Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866

  45. Luo J., Wu J., Lin W. (2017) Thinet: a filter level pruning method for deep neural network compression. In: 2017 IEEE international conference on computer vision (ICCV), pp 5068–5076

Download references

Acknowledgements

This work was supported by the Department of Science and Technology of Jiangsu Province, China (BE2018002-2, BE2018002-3)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to ShengLi Lu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, Y., Pang, W. & Lu, S. HFPQ: deep neural network compression by hardware-friendly pruning-quantization. Appl Intell 51, 7016–7028 (2021). https://doi.org/10.1007/s10489-020-01968-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01968-x

Keywords

Navigation