Skip to main content

Limitations of Shallow Networks

  • Chapter
  • First Online:
Recent Trends in Learning From Data

Part of the book series: Studies in Computational Intelligence ((SCI,volume 896))

  • 1120 Accesses

Abstract

Although originally biologically inspired neural networks were introduced as multilayer computational models, shallow networks have been dominant in applications till the recent renewal of interest in deep architectures. Experimental evidence and successful applications of deep networks pose theoretical questions asking: When and why are deep networks better than shallow ones? This chapter presents some probabilistic and constructive results on limitations of shallow networks. It shows implications of geometrical properties of high-dimensional spaces for probabilistic lower bounds on network complexity. The bounds depend on covering numbers of dictionaries of computational units and sizes of domains of functions to be computed. Probabilistic results are complemented by constructive ones built using Hadamard matrices and pseudo-noise sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Albertini, F., Sontag, E.: For neural networks, function determines form. Neural Netw. 6(7), 975–990 (1993)

    Article  Google Scholar 

  2. Anguita, D., Ghio, A., Oneto, L., Ridella, S.: Selecting the hypothesis space for improving the generalization ability of support vector machines. In: IEEE International Joint Conference on Neural Networks (2011)

    Google Scholar 

  3. Azuma, K.: Weighted sums of certain dependent random variables. Tohoku Math. J. 19, 357–367 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  4. Ba, L.J., Caruana, R.: Do deep networks really need to be deep? In: Ghahramani, Z. et al. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1–9 (2014)

    Google Scholar 

  5. Ball, K.: An elementary introduction to modern convex geometry. In: Levy, S. (ed.) Flavors of Geometry, pp. 1–58. Cambridge University Press, Cambridge (1997)

    Google Scholar 

  6. Barron, A.R.: Neural net approximation. In: Narendra, K.S. (ed.) Proceedings of the 7th Yale Workshop on Adaptive and Learning Systems, pp. 69–72. Yale University Press (1992)

    Google Scholar 

  7. Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)

    Google Scholar 

  9. Bengio, Y., LeCun, Y.: Scaling learning algorithms towards AI. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large-Scale Kernel Machines. MIT Press, Cambridge (2007)

    Google Scholar 

  10. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 (2009)

    Article  MATH  Google Scholar 

  11. Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23, 493–507 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  12. Coffrey, J.T., Goodman, R.Y.: Any code of which we cannot think is good. IEEE Trans. Inf. Theory 25(6), 1453–1461 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cover, T.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 14, 326–334 (1965)

    Article  MATH  Google Scholar 

  14. DeVore, R.A., Howard, R., Micchelli, C.: Optimal nonlinear approximation. Manuscr. Math. 63, 469–478 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  15. Donoho, D.: For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 59, 797–829 (2006)

    Article  MATH  Google Scholar 

  16. Donoho, D.L., Tsaig, Y.: Fast solution of 1-norm minimization problems when the solution may be sparse. IEEE Trans. Inf. Theory 54, 4789–4812 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  17. Erdös, P., Spencer, H.: Probabilistic Methods in Combinatorics. Academic, Cambridge (1974)

    Google Scholar 

  18. Fine, T.L.: Feedforward Neural Network Methodology. Springer, Berlin (1999)

    MATH  Google Scholar 

  19. Haussler, D.: Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension. J. Comb. Theory A 69(2), 217–232 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  20. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  21. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  22. Ito, Y.: Finite mapping by neural networks and truth functions. Math. Sci. 17, 69–77 (1992)

    MathSciNet  MATH  Google Scholar 

  23. Kainen, P.C., Kůrková, V., Sanguineti, M.: Dependence of computational models on input dimension: tractability of approximation and optimization tasks. IEEE Trans. Inf. Theory 58, (2012)

    Google Scholar 

  24. Kainen, P.C., Kůrková, V.: Quasiorthogonal dimension. In: Kosheleva, O., Shary, S., Xiang, G., Zapatrin, R. (eds.) Beyond Traditional Probabilistic Data Processing Techniques: Interval, Fuzzy, etc. Methods and Their Applications. Springer, Berlin (2020, to appear)

    Google Scholar 

  25. Kainen, P.C., Kůrková, V.: Quasiorthogonal dimension of Euclidean spaces. Appl. Math. Lett. 6(3), 7–10 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  26. Kainen, P., Kůrková, V.: Functionally equivalent feedforward neural network. Neural Comput. 6(3), 543–558 (1994)

    Article  Google Scholar 

  27. Kainen, P., Kůrková, V.: Singularities of finite scaling functions. Appl. Math. Lett. 9(2), 33–37 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  28. Kainen, P.C., Kůrková, V., Vogt, A.: Approximation by neural networks is not continuous. Neurocomputing 29, 47–56 (1999)

    Article  Google Scholar 

  29. Kainen, P.C., Kůrková, V., Vogt, A.: Geometry and topology of continuous best and near best approximations. J. Approx. Theory 105, 252–262 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  30. Kainen, P.C., Kůrková, V., Vogt, A.: Continuity of approximation by neural networks in \({L}_p\)-spaces. Ann. Oper. Res. 101, 143–147 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  31. Kecman, V.: Learning and Soft Computing. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  32. Kolmogorov, A.: Asymptotic characteristics of some completely bounded metric spaces. Dokl. Akad. Nauk. SSSR 108, 585–589 (1956)

    Google Scholar 

  33. Kůrková, V., Sanguineti, M.: Classification by sparse neural networks. IEEE Trans. Neural Netw. Learn. Syst. 30(9), 2746–2754 (2019)

    Google Scholar 

  34. Kůrková, V.: Dimension-independent rates of approximation by neural networks. In: Warwick, K., Kárný, M. (eds.) Computer-Intensive Methods in Control and Signal Processing. The Curse of Dimensionality, pp. 261–270. Birkhäuser, Boston (1997)

    Google Scholar 

  35. Kůrková, V.: High-dimensional approximation and optimization by neural networks. In: Suykens, J. et al. (eds.) Advances in Learning Theory: Methods, Models, and Applications (NATO Science Series III: Computer & Systems Sciences, vol. 190), pp. 69–88. IOS Press, Amsterdam (2003)

    Google Scholar 

  36. Kůrková, V.: Sparsity and complexity of networks computing highly-varying functions. In: International Conference on Artificial Neural Networks, pp. 534–543 (2018)

    Google Scholar 

  37. Kůrková, V.: Complexity estimates based on integral transforms induced by computational units. Neural Netw. 33, 160–167 (2012)

    Article  MATH  Google Scholar 

  38. Kůrková, V.: Constructive lower bounds on model complexity of shallow perceptron networks. Neural Comput. Appl. 29, 305–315 (2018)

    Article  Google Scholar 

  39. Kůrková, V., Kainen, P.C.: Comparing fixed and variable-width Gaussian networks. Neural Netw. 57(10), 23–28 (2014)

    Article  MATH  Google Scholar 

  40. Kůrková, V., Sanguineti, M.: Model complexities of shallow networks representing highly varying functions. Neurocomputing 171, 598–604 (2016)

    Article  Google Scholar 

  41. LeCun, Y., et al.: Handwritten digit recognition with a back-propagation network. In: Proceedings of Advances in Neural Information Processing Systems, pp. 396–404 (1990)

    Google Scholar 

  42. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  43. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)

    Article  Google Scholar 

  44. Lévy, P.: Problèmes concrets d’analyse fonctionelle. Gauthier Villards, Paris (1951)

    Google Scholar 

  45. MacWilliams, F., Sloane, N.A.: The Theory of Error-Correcting Codes. North Holland Publishing Co., Amsterdam (1977)

    Google Scholar 

  46. Maiorov, V.E., Meir, R.: On the near optimality of the stochastic approximation of smooth functions by neural networks. Adv. Comput. Math. 13, 79–103 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  47. Maiorov, V.E., Pinkus, A.: Lower bounds for approximation by MLP neural networks. Neurocomputing 25, 81–91 (1999)

    Article  MATH  Google Scholar 

  48. Makovoz, Y.: Random approximants and neural networks. J. Approx. Theory 85, 98–109 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  49. Matoušek, J.: Lectures on Discrete Geometry. Springer, New York (2002)

    Book  MATH  Google Scholar 

  50. Mhaskar, H.N., Liao, Q., Poggio, T.: Learning functions: when is deep better than shallow. Center for Brains, Minds & Machines, pp. 1–12 (2016)

    Google Scholar 

  51. Oneto, L., Ridella, S., Anguita, D.: Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Mach. Learn. 103(1), 103–136 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  52. Plan, Y., Vershynin, R.: One-bit compressed sensing by linear programming. Commun. Pure Appl. Math. 66, 1275–1297 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  53. Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., Liao, Q.: Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. Int. J. Autom. Comput. 14 (5), 503–519 (2017). https://doi.org/10.1007/s11633-017-1054-2

  54. Schläfli, L.: Gesamelte Mathematische Abhandlungen, vol. 1. Birkhäuser, Basel (1950)

    Book  Google Scholar 

  55. Schröder, M.: Number Theory in Science and Communication. Springer, New York (2009)

    Google Scholar 

  56. Sloane, N.A.: A library of Hadamard matrices. http://www.research.att.com/~njas/hadamard/

  57. Sussman, H.J.: Uniqueness of the weights for minimal feedforward nets with a given input-output map. Neural Netw. 5(4), 589–593 (1992)

    Article  Google Scholar 

  58. Sylvester, J.J.: Thoughts on inverse orthogonal matrices, simultaneous sign successions, and tessellated pavements in two or more colours, with applications to Newton’s rule, ornamental tile-work, and the theory of numbers. Philos. Mag. 34, 461–475 (1867)

    Google Scholar 

  59. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  60. Tillmann, A.: On the computational intractability of exact and approximate dictionary learning. IEEE Signal Process. Lett. 22, 45–49 (2015)

    Article  Google Scholar 

  61. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(67), (1997)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by the Czech Grant Foundation grant GA18-23827S and institutional support of the Institute of Computer Science RVO 67985807.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Věra Kůrková .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kůrková, V. (2020). Limitations of Shallow Networks. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds) Recent Trends in Learning From Data. Studies in Computational Intelligence, vol 896. Springer, Cham. https://doi.org/10.1007/978-3-030-43883-8_6

Download citation

Publish with us

Policies and ethics