Skip to main content
Log in

Entropy-based pruning method for convolutional neural networks

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Various compression approaches including pruning techniques have been developed to lighten the computational complexity of neural networks. Most pruning techniques determine the threshold of pruning weights or input features based on statistical analysis of the value of weights after completing their training. Their compression performance is limited because they do not take into account the contribution of weights to output during training. To solve this problem, we propose an entropy-based pruning technique that determines the threshold by considering the average amount of information from the weights to output while training. In the experiment section, we demonstrate and analyze our method for a convolutional neural network image classifier modeled by using Mixed National Institute of Standards and Technology image data. From the experimental results, our technique shows that compression performance has improved by more than 28% overall, compared to the well-known pruning technique. Also, the pruning speed has improved by 14%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Mao H et al (2017) Exploring the granularity of sparsity in convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017

  2. Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient DNNS. In: Advances In Neural Information Processing Systems, 2016, pp 1379–1387

  3. Vanhoucke V, Senior A, Mao MZ (2011) Improving the speed of neural networks on CPUs. In: Proceedings of Deep Learning and Unsupervised Feature Learning NIPS Workshop, 2011, p 4

  4. Denton EL et al (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems, 2014, pp 1269–1277

  5. Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, 2015, pp 3123–3131

  6. Hubara I et al (2016) Binarized neural networks. In: Advances in Neural Information Processing Systems, 2016, pp 4107–4115

  7. Denil M et al (2013) Predicting parameters in deep learning. In: Advances in Neural Information Processing Systems, 2013, pp 2148–2156

  8. Ye J (2005) Generalized low rank approximations of matrices. Mach Learn 61(1–3):167–191

    Article  MATH  Google Scholar 

  9. Yu D, Li Deng (2011) Deep learning and its applications to signal and information processing [exploratory dsp]. IEEE Signal Process Mag 28(1):145–154

    Article  Google Scholar 

  10. Cheng J et al (2017) Quantized CNN: a unified approach to accelerate and compress convolutional networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2017.2774288

    Google Scholar 

  11. Schneider P, Biehl M, Hammer B (2009) Adaptive relevance matrices in learning vector quantization. Neural Comput 21(12):3532–3561

    Article  MathSciNet  MATH  Google Scholar 

  12. Kim JK, Kang S (2017) Neural network-based coronary heart disease risk prediction using feature correlation analysis. J Healthc Eng. https://doi.org/10.1155/2017/2780501

    Google Scholar 

  13. Le Cun Y, Denker J (1989) Sove Solla, Richard Howard and Lawrence Jockel, “Optimal Brain Damage,”. In: Proceedings of 1989 IEEE Conference on Neural Information Processing Systems—Natural and Synthetic, 1989

  14. Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Advances in neural information processing systems, p 164–171

  15. Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12(6):1386–1399

    Article  Google Scholar 

  16. Han S et al (2015) Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, 2015, pp 1135–1143

  17. Karnin ED (1990) A simple procedure for pruning back-propagation trained neural networks. IEEE Trans Neural Netw 1(2):239–242

    Article  Google Scholar 

  18. Chauvin Y, Rumelhart DE (eds) (1995) Backpropagation theory, architectures, and applications. Psychology Press, Hove

    Google Scholar 

  19. Lindblad G (1973) Entropy, information and quantum measurements. Commun Math Phys 33(4):305–322

    Article  MathSciNet  Google Scholar 

  20. Föllmer H (1973) On entropy and information gain in random fields. Probab Theory Relat Fields 26(3):207–217

    MathSciNet  MATH  Google Scholar 

  21. Borland L, Plastino AR, Tsallis C (1998) Information gain within nonextensive thermostatistics. J Math Phys 39(12):6490–6501

    Article  MathSciNet  MATH  Google Scholar 

  22. Nalewajski* RF (2005) Partial communication channels of molecular fragments and their entropy/information indices. Mol Phys 103(4):451–470

    Article  Google Scholar 

  23. Huerta MA, Robertson HS (1969) Entropy, information theory, and the approach to equilibrium of coupled harmonic oscillator systems. J Stat Phys 1(3):393–414

    Article  Google Scholar 

  24. Ebeling W (1993) Entropy and information in processes of self-organization: uncertainty and predictability. Physica A Stat Mech Appl 194(1–4):563–575

    Article  Google Scholar 

  25. Lecun Y, Cortes C, Burges CJC (2010) MNIST handwritten digit database. AT&T Labs [Online]. http://yann.lecun.com/exdb/mnist. Accessed 16 Nov 2017

  26. Krizhevsky A, Nair V, Hinton G (2014) The CIFAR-10 dataset. Online http://www.cs.toronto.edu/kriz/cifar.html. Accessed 16 Nov 2017

  27. Demmel J, Kahan W (1990) Accurate singular values of bidiagonal matrices. SIAM J Sci Stat Comput 11(5):873–912

    Article  MathSciNet  MATH  Google Scholar 

  28. Hall BA et al (1998) Method for adaptive quantization by multiplication of luminance pixel blocks by a modified, frequency ordered hadamard matrix. U.S. Patent No 5,786,856, 1998

  29. Berg A, Deng J, Fei-Fei L (2012) Large scale visual recognition challenge 2012. www.imagenet.org/challenges. Accessed 16 Nov 2017

  30. Lee K, Ellis DPW (2010) Audio-based semantic concept classification for consumer video. IEEE Trans Audio Speech Lang Process 18(6):1406–1416

    Article  Google Scholar 

  31. Polyak A, Wolf L (2015) Channel-level acceleration of deep face representations. IEEE Access 3:2163–2175

    Article  Google Scholar 

  32. Kharab A, Guenther RB (2011) An introduction to numerical methods: a MATLAB approach. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  33. Liu X et al (2018) Efficient sparse-winograd convolutional neural networks. arXiv preprint arXiv:1802.06367

  34. Han S et al (2016) DSD: regularizing deep neural networks with dense-sparse-dense training flow. arXiv preprint arXiv:1607.04381

  35. Hoeffding W et al (1948) The central limit theorem for dependent random variables. Duke Math J 15(3):773–780

    Article  MathSciNet  MATH  Google Scholar 

  36. Meek C, Thiesson B, Heckerman D (2002) The learning-curve sampling method applied to model-based clustering. J Mach Learn Res 2:397–418

    MathSciNet  MATH  Google Scholar 

  37. Abadi M et al (2016) TensorFlow: a system for large-scale machine learning. In: OSDI, 2016, pp 265–283

Download references

Acknowledgement

This work was supported by Inha University Research Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanggil Kang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hur, C., Kang, S. Entropy-based pruning method for convolutional neural networks. J Supercomput 75, 2950–2963 (2019). https://doi.org/10.1007/s11227-018-2684-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2684-z

Keywords

Navigation