Skip to main content

Obstacles to Depth Compression of Neural Networks

  • Conference paper
  • First Online:
  • 2118 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Abstract

Massive neural network models are often preferred over smaller models for their more favorable optimization landscapes during training. However, since the cost of evaluating a model grows with the size, it is desirable to obtain an equivalent compressed neural network model before deploying it for prediction. The best-studied tools for compressing neural networks obtain models with broadly similar architectures, including the depth of the model. No guarantees have been available for obtaining compressed models with substantially reduced depth. In this paper, we present fundamental obstacles to any algorithm achieving depth compression of neural networks. In particular, we show that depth compression is as hard as learning the input distribution, ruling out guarantees for most existing approaches. Furthermore, even when the input distribution is of a known, simple form, we show that there are no local algorithms for depth compression.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In typical applications, f itself will be an approximation of some concept g known only through labeled examples, and the real goal is find an approximation of g in \(\mathcal {H}\). To simplify our discussion, we will not attempt to find a compression of f which approximates this concept g better than f does itself, and so g can be safely ignored.

  2. 2.

    “Improper” in the sense of not belonging to the hypothesis class \(\mathcal {H}\).

References

  1. Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems (NIPS), pp. 2654–2662 (2014)

    Google Scholar 

  2. Blum, A., Frieze, A., Kannan, R., Vempala, S.: A polynomial-time algorithm for learning noisy linear threshold functions. Algorithmica 22(1–2), 35–52 (1998)

    Article  MathSciNet  Google Scholar 

  3. Blum, A., Kalai, A., Wasserman, H.: Noise-tolerant learning, the parity problem, and the statistical query model. J. ACM (JACM) 50(4), 506–519 (2003)

    Article  MathSciNet  Google Scholar 

  4. Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541 (2006)

    Google Scholar 

  5. Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: Advances in Neural Information Processing Systems, pp. 742–751 (2017)

    Google Scholar 

  6. Chu, C.T., et al.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems, pp. 281–288 (2007)

    Google Scholar 

  7. Daniely, A.: Depth separation for neural networks. In: Conference on Learning Theory, pp. 690–696 (2017)

    Google Scholar 

  8. Denil, M., Shakibi, B., Dinh, L., Ranzato, M., De Freitas, N.: Predicting parameters in deep learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 2148–2156 (2013)

    Google Scholar 

  9. Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems (NIPS), pp. 1269–1277 (2014)

    Google Scholar 

  10. Dunagan, J., Vempala, S.: A simple polynomial-time rescaling algorithm for solving linear programs. Math. Program. 114(1), 101–114 (2008)

    Article  MathSciNet  Google Scholar 

  11. Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: Conference on Learning Theory, pp. 907–940 (2016)

    Google Scholar 

  12. Feldman, V.: A general characterization of the statistical query complexity. In: Conference on Learning Theory, pp. 785–830 (2017)

    Google Scholar 

  13. Feldman, V., Gopalan, P., Khot, S., Ponnuswami, A.K.: On agnostic learning of parities, monomials, and halfspaces. SIAM J. Comput. 39(2), 606–645 (2009)

    Article  MathSciNet  Google Scholar 

  14. Feldman, V., Grigorescu, E., Reyzin, L., Vempala, S.S., Xiao, Y.: Statistical algorithms and a lower bound for detecting planted cliques. J. ACM (JACM) 64(2), 1–37 (2017)

    Article  MathSciNet  Google Scholar 

  15. Gunasekar, S., Lee, J.D., Soudry, D., Srebro, N.: Implicit bias of gradient descent on linear convolutional networks. In: Advances in Neural Information Processing Systems, pp. 9461–9471 (2018)

    Google Scholar 

  16. Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (NIPS), pp. 1135–1143 (2015)

    Google Scholar 

  17. Hassibi, B., Stork, D.G., Wolff, G.J.: Optimal brain surgeon and general network pruning. In: Neural Networks, pp. 293–299 (1993)

    Google Scholar 

  18. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)

  19. Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv:1405.3866 (2014)

  20. Kearns, M.J.: Efficient noise-tolerant learning from statistical queries. In: Proceedings of the 25th ACM Symposium on Theory of Computing (STOC), pp. 392–401 (1993)

    Google Scholar 

  21. Kiltz, E., Pietrzak, K., Venturi, D., Cash, D., Jain, A.: Efficient authentication from hard learning problems. J. Cryptol. 30(4), 1238–1275 (2017)

    Article  MathSciNet  Google Scholar 

  22. Klivans, A., Kothari, P.: Embedding hard learning problems into Gaussian space. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014) (2014)

    Google Scholar 

  23. LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems (NIPS), pp. 598–605 (1990)

    Google Scholar 

  24. Neyshabur, B., Tomioka, R., Srebro, N.: In search of the real inductive bias: on the role of implicit regularization in deep learning. arXiv preprint arXiv:1412.6614 (2014)

  25. Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597 (2016)

    Google Scholar 

  26. Raghu, M., Poole, B., Kleinberg, J., Ganguli, S., Dickstein, J.S.: On the expressive power of deep neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2847–2854 (2017)

    Google Scholar 

  27. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv:1412.6550 (2014)

  28. Safran, I., Shamir, O.: Depth-width tradeoffs in approximating natural functions with neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2979–2987 (2017)

    Google Scholar 

  29. Song, L., Vempala, S., Wilmes, J., Xie, B.: On the complexity of learning neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 5520–5528 (2017)

    Google Scholar 

  30. Soudry, D., Hoffer, E., Nacson, M.S., Gunasekar, S., Srebro, N.: The implicit bias of gradient descent on separable data. J. Mach. Learn. Res. 19(1), 2822–2878 (2018)

    MathSciNet  MATH  Google Scholar 

  31. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv:1312.6199 (2013)

  32. Tai, C., Xiao, T., Zhang, Y., Wang, X., Weinan, E.: Convolutional neural networks with low-rank regularization. arXiv:1511.06067 (2015)

  33. Vempala, S., Wilmes, J.: Gradient descent for one-hidden-layer neural networks: polynomial convergence and SQ lower bounds. In: Conference on Learning Theory, pp. 3115–3117 (2019)

    Google Scholar 

  34. Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv:1710.01878 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Wilmes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Burstein, W., Wilmes, J. (2020). Obstacles to Depth Compression of Neural Networks. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61616-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61615-1

  • Online ISBN: 978-3-030-61616-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics