Obstacles to Depth Compression of Neural Networks

Burstein, Will; Wilmes, John

doi:10.1007/978-3-030-61616-8_9

Obstacles to Depth Compression of Neural Networks

Will Burstein¹¹ &
John Wilmes¹¹

Conference paper
First Online: 14 October 2020

2118 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Abstract

Massive neural network models are often preferred over smaller models for their more favorable optimization landscapes during training. However, since the cost of evaluating a model grows with the size, it is desirable to obtain an equivalent compressed neural network model before deploying it for prediction. The best-studied tools for compressing neural networks obtain models with broadly similar architectures, including the depth of the model. No guarantees have been available for obtaining compressed models with substantially reduced depth. In this paper, we present fundamental obstacles to any algorithm achieving depth compression of neural networks. In particular, we show that depth compression is as hard as learning the input distribution, ruling out guarantees for most existing approaches. Furthermore, even when the input distribution is of a known, simple form, we show that there are no local algorithms for depth compression.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In typical applications, f itself will be an approximation of some concept g known only through labeled examples, and the real goal is find an approximation of g in \(\mathcal {H}\). To simplify our discussion, we will not attempt to find a compression of f which approximates this concept g better than f does itself, and so g can be safely ignored.
2.
“Improper” in the sense of not belonging to the hypothesis class \(\mathcal {H}\).

References

Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems (NIPS), pp. 2654–2662 (2014)
Google Scholar
Blum, A., Frieze, A., Kannan, R., Vempala, S.: A polynomial-time algorithm for learning noisy linear threshold functions. Algorithmica 22(1–2), 35–52 (1998)
Article MathSciNet Google Scholar
Blum, A., Kalai, A., Wasserman, H.: Noise-tolerant learning, the parity problem, and the statistical query model. J. ACM (JACM) 50(4), 506–519 (2003)
Article MathSciNet Google Scholar
Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541 (2006)
Google Scholar
Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: Advances in Neural Information Processing Systems, pp. 742–751 (2017)
Google Scholar
Chu, C.T., et al.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems, pp. 281–288 (2007)
Google Scholar
Daniely, A.: Depth separation for neural networks. In: Conference on Learning Theory, pp. 690–696 (2017)
Google Scholar
Denil, M., Shakibi, B., Dinh, L., Ranzato, M., De Freitas, N.: Predicting parameters in deep learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 2148–2156 (2013)
Google Scholar
Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems (NIPS), pp. 1269–1277 (2014)
Google Scholar
Dunagan, J., Vempala, S.: A simple polynomial-time rescaling algorithm for solving linear programs. Math. Program. 114(1), 101–114 (2008)
Article MathSciNet Google Scholar
Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: Conference on Learning Theory, pp. 907–940 (2016)
Google Scholar
Feldman, V.: A general characterization of the statistical query complexity. In: Conference on Learning Theory, pp. 785–830 (2017)
Google Scholar
Feldman, V., Gopalan, P., Khot, S., Ponnuswami, A.K.: On agnostic learning of parities, monomials, and halfspaces. SIAM J. Comput. 39(2), 606–645 (2009)
Article MathSciNet Google Scholar
Feldman, V., Grigorescu, E., Reyzin, L., Vempala, S.S., Xiao, Y.: Statistical algorithms and a lower bound for detecting planted cliques. J. ACM (JACM) 64(2), 1–37 (2017)
Article MathSciNet Google Scholar
Gunasekar, S., Lee, J.D., Soudry, D., Srebro, N.: Implicit bias of gradient descent on linear convolutional networks. In: Advances in Neural Information Processing Systems, pp. 9461–9471 (2018)
Google Scholar
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (NIPS), pp. 1135–1143 (2015)
Google Scholar
Hassibi, B., Stork, D.G., Wolff, G.J.: Optimal brain surgeon and general network pruning. In: Neural Networks, pp. 293–299 (1993)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv:1405.3866 (2014)
Kearns, M.J.: Efficient noise-tolerant learning from statistical queries. In: Proceedings of the 25th ACM Symposium on Theory of Computing (STOC), pp. 392–401 (1993)
Google Scholar
Kiltz, E., Pietrzak, K., Venturi, D., Cash, D., Jain, A.: Efficient authentication from hard learning problems. J. Cryptol. 30(4), 1238–1275 (2017)
Article MathSciNet Google Scholar
Klivans, A., Kothari, P.: Embedding hard learning problems into Gaussian space. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014) (2014)
Google Scholar
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems (NIPS), pp. 598–605 (1990)
Google Scholar
Neyshabur, B., Tomioka, R., Srebro, N.: In search of the real inductive bias: on the role of implicit regularization in deep learning. arXiv preprint arXiv:1412.6614 (2014)
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597 (2016)
Google Scholar
Raghu, M., Poole, B., Kleinberg, J., Ganguli, S., Dickstein, J.S.: On the expressive power of deep neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2847–2854 (2017)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv:1412.6550 (2014)
Safran, I., Shamir, O.: Depth-width tradeoffs in approximating natural functions with neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2979–2987 (2017)
Google Scholar
Song, L., Vempala, S., Wilmes, J., Xie, B.: On the complexity of learning neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 5520–5528 (2017)
Google Scholar
Soudry, D., Hoffer, E., Nacson, M.S., Gunasekar, S., Srebro, N.: The implicit bias of gradient descent on separable data. J. Mach. Learn. Res. 19(1), 2822–2878 (2018)
MathSciNet MATH Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv:1312.6199 (2013)
Tai, C., Xiao, T., Zhang, Y., Wang, X., Weinan, E.: Convolutional neural networks with low-rank regularization. arXiv:1511.06067 (2015)
Vempala, S., Wilmes, J.: Gradient descent for one-hidden-layer neural networks: polynomial convergence and SQ lower bounds. In: Conference on Learning Theory, pp. 3115–3117 (2019)
Google Scholar
Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv:1710.01878 (2017)

Download references

Author information

Authors and Affiliations

Brandeis University, Waltham, MA, USA
Will Burstein & John Wilmes

Authors

Will Burstein
View author publications
You can also search for this author in PubMed Google Scholar
John Wilmes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Wilmes .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burstein, W., Wilmes, J. (2020). Obstacles to Depth Compression of Neural Networks. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-61616-8_9
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics