Abstract
Although originally biologically inspired neural networks were introduced as multilayer computational models, shallow networks have been dominant in applications till the recent renewal of interest in deep architectures. Experimental evidence and successful applications of deep networks pose theoretical questions asking: When and why are deep networks better than shallow ones? This chapter presents some probabilistic and constructive results on limitations of shallow networks. It shows implications of geometrical properties of high-dimensional spaces for probabilistic lower bounds on network complexity. The bounds depend on covering numbers of dictionaries of computational units and sizes of domains of functions to be computed. Probabilistic results are complemented by constructive ones built using Hadamard matrices and pseudo-noise sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Albertini, F., Sontag, E.: For neural networks, function determines form. Neural Netw. 6(7), 975–990 (1993)
Anguita, D., Ghio, A., Oneto, L., Ridella, S.: Selecting the hypothesis space for improving the generalization ability of support vector machines. In: IEEE International Joint Conference on Neural Networks (2011)
Azuma, K.: Weighted sums of certain dependent random variables. Tohoku Math. J. 19, 357–367 (1967)
Ba, L.J., Caruana, R.: Do deep networks really need to be deep? In: Ghahramani, Z. et al. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1–9 (2014)
Ball, K.: An elementary introduction to modern convex geometry. In: Levy, S. (ed.) Flavors of Geometry, pp. 1–58. Cambridge University Press, Cambridge (1997)
Barron, A.R.: Neural net approximation. In: Narendra, K.S. (ed.) Proceedings of the 7th Yale Workshop on Adaptive and Learning Systems, pp. 69–72. Yale University Press (1992)
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993)
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Bengio, Y., LeCun, Y.: Scaling learning algorithms towards AI. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large-Scale Kernel Machines. MIT Press, Cambridge (2007)
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 (2009)
Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23, 493–507 (1952)
Coffrey, J.T., Goodman, R.Y.: Any code of which we cannot think is good. IEEE Trans. Inf. Theory 25(6), 1453–1461 (1990)
Cover, T.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 14, 326–334 (1965)
DeVore, R.A., Howard, R., Micchelli, C.: Optimal nonlinear approximation. Manuscr. Math. 63, 469–478 (1989)
Donoho, D.: For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 59, 797–829 (2006)
Donoho, D.L., Tsaig, Y.: Fast solution of 1-norm minimization problems when the solution may be sparse. IEEE Trans. Inf. Theory 54, 4789–4812 (2008)
Erdös, P., Spencer, H.: Probabilistic Methods in Combinatorics. Academic, Cambridge (1974)
Fine, T.L.: Feedforward Neural Network Methodology. Springer, Berlin (1999)
Haussler, D.: Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension. J. Comb. Theory A 69(2), 217–232 (1995)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Ito, Y.: Finite mapping by neural networks and truth functions. Math. Sci. 17, 69–77 (1992)
Kainen, P.C., Kůrková, V., Sanguineti, M.: Dependence of computational models on input dimension: tractability of approximation and optimization tasks. IEEE Trans. Inf. Theory 58, (2012)
Kainen, P.C., Kůrková, V.: Quasiorthogonal dimension. In: Kosheleva, O., Shary, S., Xiang, G., Zapatrin, R. (eds.) Beyond Traditional Probabilistic Data Processing Techniques: Interval, Fuzzy, etc. Methods and Their Applications. Springer, Berlin (2020, to appear)
Kainen, P.C., Kůrková, V.: Quasiorthogonal dimension of Euclidean spaces. Appl. Math. Lett. 6(3), 7–10 (1993)
Kainen, P., Kůrková, V.: Functionally equivalent feedforward neural network. Neural Comput. 6(3), 543–558 (1994)
Kainen, P., Kůrková, V.: Singularities of finite scaling functions. Appl. Math. Lett. 9(2), 33–37 (1996)
Kainen, P.C., Kůrková, V., Vogt, A.: Approximation by neural networks is not continuous. Neurocomputing 29, 47–56 (1999)
Kainen, P.C., Kůrková, V., Vogt, A.: Geometry and topology of continuous best and near best approximations. J. Approx. Theory 105, 252–262 (2000)
Kainen, P.C., Kůrková, V., Vogt, A.: Continuity of approximation by neural networks in \({L}_p\)-spaces. Ann. Oper. Res. 101, 143–147 (2001)
Kecman, V.: Learning and Soft Computing. MIT Press, Cambridge (2001)
Kolmogorov, A.: Asymptotic characteristics of some completely bounded metric spaces. Dokl. Akad. Nauk. SSSR 108, 585–589 (1956)
Kůrková, V., Sanguineti, M.: Classification by sparse neural networks. IEEE Trans. Neural Netw. Learn. Syst. 30(9), 2746–2754 (2019)
Kůrková, V.: Dimension-independent rates of approximation by neural networks. In: Warwick, K., Kárný, M. (eds.) Computer-Intensive Methods in Control and Signal Processing. The Curse of Dimensionality, pp. 261–270. Birkhäuser, Boston (1997)
Kůrková, V.: High-dimensional approximation and optimization by neural networks. In: Suykens, J. et al. (eds.) Advances in Learning Theory: Methods, Models, and Applications (NATO Science Series III: Computer & Systems Sciences, vol. 190), pp. 69–88. IOS Press, Amsterdam (2003)
Kůrková, V.: Sparsity and complexity of networks computing highly-varying functions. In: International Conference on Artificial Neural Networks, pp. 534–543 (2018)
Kůrková, V.: Complexity estimates based on integral transforms induced by computational units. Neural Netw. 33, 160–167 (2012)
Kůrková, V.: Constructive lower bounds on model complexity of shallow perceptron networks. Neural Comput. Appl. 29, 305–315 (2018)
Kůrková, V., Kainen, P.C.: Comparing fixed and variable-width Gaussian networks. Neural Netw. 57(10), 23–28 (2014)
Kůrková, V., Sanguineti, M.: Model complexities of shallow networks representing highly varying functions. Neurocomputing 171, 598–604 (2016)
LeCun, Y., et al.: Handwritten digit recognition with a back-propagation network. In: Proceedings of Advances in Neural Information Processing Systems, pp. 396–404 (1990)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Lévy, P.: Problèmes concrets d’analyse fonctionelle. Gauthier Villards, Paris (1951)
MacWilliams, F., Sloane, N.A.: The Theory of Error-Correcting Codes. North Holland Publishing Co., Amsterdam (1977)
Maiorov, V.E., Meir, R.: On the near optimality of the stochastic approximation of smooth functions by neural networks. Adv. Comput. Math. 13, 79–103 (2000)
Maiorov, V.E., Pinkus, A.: Lower bounds for approximation by MLP neural networks. Neurocomputing 25, 81–91 (1999)
Makovoz, Y.: Random approximants and neural networks. J. Approx. Theory 85, 98–109 (1996)
Matoušek, J.: Lectures on Discrete Geometry. Springer, New York (2002)
Mhaskar, H.N., Liao, Q., Poggio, T.: Learning functions: when is deep better than shallow. Center for Brains, Minds & Machines, pp. 1–12 (2016)
Oneto, L., Ridella, S., Anguita, D.: Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Mach. Learn. 103(1), 103–136 (2015)
Plan, Y., Vershynin, R.: One-bit compressed sensing by linear programming. Commun. Pure Appl. Math. 66, 1275–1297 (2013)
Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., Liao, Q.: Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. Int. J. Autom. Comput. 14 (5), 503–519 (2017). https://doi.org/10.1007/s11633-017-1054-2
Schläfli, L.: Gesamelte Mathematische Abhandlungen, vol. 1. Birkhäuser, Basel (1950)
Schröder, M.: Number Theory in Science and Communication. Springer, New York (2009)
Sloane, N.A.: A library of Hadamard matrices. http://www.research.att.com/~njas/hadamard/
Sussman, H.J.: Uniqueness of the weights for minimal feedforward nets with a given input-output map. Neural Netw. 5(4), 589–593 (1992)
Sylvester, J.J.: Thoughts on inverse orthogonal matrices, simultaneous sign successions, and tessellated pavements in two or more colours, with applications to Newton’s rule, ornamental tile-work, and the theory of numbers. Philos. Mag. 34, 461–475 (1867)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Tillmann, A.: On the computational intractability of exact and approximate dictionary learning. IEEE Signal Process. Lett. 22, 45–49 (2015)
Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(67), (1997)
Acknowledgements
This work was partially supported by the Czech Grant Foundation grant GA18-23827S and institutional support of the Institute of Computer Science RVO 67985807.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kůrková, V. (2020). Limitations of Shallow Networks. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds) Recent Trends in Learning From Data. Studies in Computational Intelligence, vol 896. Springer, Cham. https://doi.org/10.1007/978-3-030-43883-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-43883-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43882-1
Online ISBN: 978-3-030-43883-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)