A revisit to MacKay algorithm and its application to deep network compression

Li, Chune; Mao, Yongyi; Zhang, Richong; Huai, Jinpeng

doi:10.1007/s11704-019-8390-z

A revisit to MacKay algorithm and its application to deep network compression

Research Article
Published: 03 January 2020

Volume 14, article number 144304, (2020)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Chune Li¹,
Yongyi Mao²,
Richong Zhang¹ &
…
Jinpeng Huai¹

98 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

An iterative procedure introduced in MacKay’s evidence framework is often used for estimating the hyperparameter in empirical Bayes. Together with the use of a particular form of prior, the estimation of the hyperparameter reduces to an automatic relevance determination model, which provides a soft way of pruning model parameters. Despite the effectiveness of this estimation procedure, it has stayed primarily as a heuristic to date and its application to deep neural network has not yet been explored. This paper formally investigates the mathematical nature of this procedure and justifies it as a well-principled algorithm framework, which we call the MacKay algorithm. As an application, we demonstrate its use in deep neural networks, which have typically complicated structure with millions of parameters and can be pruned to reduce the memory requirement and boost computational efficiency. In experiments, we adopt MacKay algorithm to prune the parameters of both simple networks such as LeNet, deep convolution VGG-like networks, and residual netowrks for large image classification task. Experimental results show that the algorithm can compress neural networks to a high level of sparsity with little loss of prediction accuracy, which is comparable with the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regularization-based pruning of irrelevant weights in deep neural architectures

Article 05 January 2023

Deep Learning Frameworks for Convolutional Neural Networks—A Benchmark Test

Linear Regularized Compression of Deep Convolutional Neural Networks

References

Li C, Mao Y, Zhang R, Huai J. On hyper-parameter estimation in empirical Bayes: a revisit of the MacKay algorithm. In: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence. 2016, 477–486
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529
Article Google Scholar
Bishop C M. Pattern Recognition and Machine Learning. Springer, New York, 2016
MATH Google Scholar
MacKay D J. The evidence framework applied to classification networks. Neural Computation, 1992, 4(5): 720–736
Article Google Scholar
MacKay D J, Neal R M. Automatic relevance determination for neural networks. Technical Report in Preparation, Cambridge University, 1994
MacKay D J. Probable networks and plausible predictions: a review of practical Bayesian methods for supervised neural networks. Network Computation in Neural Systems, 1995, 6(3): 469–505
Article Google Scholar
Bishop C M. Bayesian PCA. In: Proceedings of the 11th International Conference on Neural Information Processing Systems. 1999, 382–388
Tipping M E. Sparse Bayesian learning and the relevance vector machine. The Journal of Machine Learning Research, 2001, 1: 211–244
MathSciNet MATH Google Scholar
Tan V Y, Févotte C. Automatic relevance determination in nonnegative matrix factorization. In: SPARS’09-Signal Processing with Adaptive Sparse Structured Representations. 2009
MacKay D J. Bayesian interpolation. Neural Computation, 1992, 4(3): 415–447
Article Google Scholar
MacKay D J. A practical Bayesian framework for backpropagation networks. Neural Computation, 1992, 4(3): 448–472
Article Google Scholar
Solomon J. Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics. CRC Press, 2015
Murphy K P. Machine Learning: A Probabilistic Perspective. MIT Press, 2012
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1724–1734
Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014, 1746–1751
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 1097–1105
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014, arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M. Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 4489–4497
Srivastava N, Mansimov E, Salakhudinov R. Unsupervised learning of video representations using LSTMs. In: Proceedings of the International Conference on Machine Learning. 2015, 843–852
Deng L, Yu D. Deep learning: methods and applications. Foundations and Trends in Signal Processing, 2014, 7(3–4): 197–387
Article MathSciNet Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A C, Fei-Fei L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252
Article MathSciNet Google Scholar
Han S, Mao H, Dally W J. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. 2015, arXiv preprint arXiv:1510.00149
Li H, Kadav A, Durdanovic I, Samet H, Graf H P. Pruning filters for efficient convnets. 2016, arXiv preprint arXiv:1608.08710
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C. Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 2755–2763
Louizos C, Welling M, Kingma D P. Learning sparse neural networks through 10 regularization. In: Proceedings of International Conference on Learning Representations. 2018
Molchanov D, Ashukha A, Vetrov D. Variational dropout sparsifies deep neural networks. In: Proceedings of the International Conference on Machine Learning. 2017, 2498–2507
Neklyudov K, Molchanov D, Ashukha A, Vetrov D P. Structured Bayesian pruning via log-normal multiplicative noise. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6775–6784
Dai B, Zhu C, Guo B, Wipf D. Compressing neural networks using the variational information bottleneck. In: Proceedings of the International Conference on Machine Learning. 2018, 1143–1152
Louizos C, Ullrich K, Welling M. Bayesian compression for deep learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 3290–3300
Karaletsos T, Rätsch G. Automatic relevance determination for deep generative models. 2015, arXiv preprint arXiv:1505.07765
Chatzis S P. Sparse Bayesian recurrent neural networks. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2015, 359–372
Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Technical Report, Citeseer, 2009
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324
Article Google Scholar
He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1026–1034
Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014, arXiv preprint arXiv:1412.6980
Dong X, Huang J, Yang Y, Yan S. More is less: a more complicated network with less inference complexity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 5840–5848
He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017
He Y, Kang G, Dong X, Fu Y, Yang Y. Soft filter pruning for accelerating deep convolutional neural networks. In: Proceedings of International Joint Conference on Artificial Intelligence. 2018, 2234–2240
Alemi A A, Fischer I, Dillon J V, Murphy K. Deep variational information bottleneck. 2016, arXiv preprint arXiv:1612.00410

Download references

Acknowledgements

This work was supported partly by China Scholarship Council (201706020062), by China 973 program (2015CB358700), by the National Natural Science Foundation of China (Grant Nos. 61772059, 61421003), and by the Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC) and State Key Laboratory of Software Development Environment (SKLSDE-2018ZX-17).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Chune Li, Richong Zhang & Jinpeng Huai
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, K1N6N5, Canada
Yongyi Mao

Authors

Chune Li
View author publications
Search author on:PubMed Google Scholar
Yongyi Mao
View author publications
Search author on:PubMed Google Scholar
Richong Zhang
View author publications
Search author on:PubMed Google Scholar
Jinpeng Huai
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Richong Zhang.

Additional information

The is an expanded version of a preliminary conference paper that was presented at UAI 2016 [1]. Besides formulating a widely adopted procedure as a well-principled algorithmic framework, this paper significantly expands it to the application of deep neural network compression.

Chune Li obtained her BS degree of Computer Science and Technology at Beihang University, China in 2011. She is now a PhD Student at the School of Computer Science and Engineering, Beihang University, China. Her research includes machine learning and natural language processing.

Yongyi Mao completed his PhD in electrical engineering at the University of Toronto, Canada, in 2003 and joined the faculty of School of Information Technology and Engineering at the University of Ottawa, Canada, as an assistant professor. He was promoted to associate professor in 2008 and then to full professor in 2012. Yongyi Mao’s research includes communications and machine learning two main areas.

Jinpeng Huai received a PhD degree in Computer Science and Engineering at Beihang University, China. He is a professor with Beihang University, China. His research interests include software engineering and thorey, distributed systems, grid computing, trustworthiness, network security, Internet and E-commerce technologies.

Richong Zhang received his PhD form the School of Information Technology and Engineering, University of Ottawa, Canada in 2011. He is currently an associate professor in the School of Computer Science and Engineering, Beihang University, China. His research interests include machine learning and data mining and their applications in recommender systems, knowledge graph and crowdsourcing.

Electronic supplementary material