Limitations of Shallow Networks

Chapter
First Online: 04 April 2020

pp 129–154
Cite this chapter

Recent Trends in Learning From Data

Věra Kůrková⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 896))

1120 Accesses

Abstract

Although originally biologically inspired neural networks were introduced as multilayer computational models, shallow networks have been dominant in applications till the recent renewal of interest in deep architectures. Experimental evidence and successful applications of deep networks pose theoretical questions asking: When and why are deep networks better than shallow ones? This chapter presents some probabilistic and constructive results on limitations of shallow networks. It shows implications of geometrical properties of high-dimensional spaces for probabilistic lower bounds on network complexity. The bounds depend on covering numbers of dictionaries of computational units and sizes of domains of functions to be computed. Probabilistic results are complemented by constructive ones built using Hadamard matrices and pseudo-noise sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Limitations of shallow networks representing finite mappings

Article 17 August 2018

Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review

Article Open access 14 March 2017

Efficient shallow learning as an alternative to deep learning

Article Open access 20 April 2023

References

Albertini, F., Sontag, E.: For neural networks, function determines form. Neural Netw. 6(7), 975–990 (1993)
Article Google Scholar
Anguita, D., Ghio, A., Oneto, L., Ridella, S.: Selecting the hypothesis space for improving the generalization ability of support vector machines. In: IEEE International Joint Conference on Neural Networks (2011)
Google Scholar
Azuma, K.: Weighted sums of certain dependent random variables. Tohoku Math. J. 19, 357–367 (1967)
Article MathSciNet MATH Google Scholar
Ba, L.J., Caruana, R.: Do deep networks really need to be deep? In: Ghahramani, Z. et al. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1–9 (2014)
Google Scholar
Ball, K.: An elementary introduction to modern convex geometry. In: Levy, S. (ed.) Flavors of Geometry, pp. 1–58. Cambridge University Press, Cambridge (1997)
Google Scholar
Barron, A.R.: Neural net approximation. In: Narendra, K.S. (ed.) Proceedings of the 7th Yale Workshop on Adaptive and Learning Systems, pp. 69–72. Yale University Press (1992)
Google Scholar
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993)
Article MathSciNet MATH Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Google Scholar
Bengio, Y., LeCun, Y.: Scaling learning algorithms towards AI. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large-Scale Kernel Machines. MIT Press, Cambridge (2007)
Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 (2009)
Article MATH Google Scholar
Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23, 493–507 (1952)
Article MathSciNet MATH Google Scholar
Coffrey, J.T., Goodman, R.Y.: Any code of which we cannot think is good. IEEE Trans. Inf. Theory 25(6), 1453–1461 (1990)
Article MathSciNet MATH Google Scholar
Cover, T.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 14, 326–334 (1965)
Article MATH Google Scholar
DeVore, R.A., Howard, R., Micchelli, C.: Optimal nonlinear approximation. Manuscr. Math. 63, 469–478 (1989)
Article MathSciNet MATH Google Scholar
Donoho, D.: For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 59, 797–829 (2006)
Article MATH Google Scholar
Donoho, D.L., Tsaig, Y.: Fast solution of 1-norm minimization problems when the solution may be sparse. IEEE Trans. Inf. Theory 54, 4789–4812 (2008)
Article MathSciNet MATH Google Scholar
Erdös, P., Spencer, H.: Probabilistic Methods in Combinatorics. Academic, Cambridge (1974)
Google Scholar
Fine, T.L.: Feedforward Neural Network Methodology. Springer, Berlin (1999)
MATH Google Scholar
Haussler, D.: Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension. J. Comb. Theory A 69(2), 217–232 (1995)
Article MathSciNet MATH Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Article MathSciNet MATH Google Scholar
Ito, Y.: Finite mapping by neural networks and truth functions. Math. Sci. 17, 69–77 (1992)
MathSciNet MATH Google Scholar
Kainen, P.C., Kůrková, V., Sanguineti, M.: Dependence of computational models on input dimension: tractability of approximation and optimization tasks. IEEE Trans. Inf. Theory 58, (2012)
Google Scholar
Kainen, P.C., Kůrková, V.: Quasiorthogonal dimension. In: Kosheleva, O., Shary, S., Xiang, G., Zapatrin, R. (eds.) Beyond Traditional Probabilistic Data Processing Techniques: Interval, Fuzzy, etc. Methods and Their Applications. Springer, Berlin (2020, to appear)
Google Scholar
Kainen, P.C., Kůrková, V.: Quasiorthogonal dimension of Euclidean spaces. Appl. Math. Lett. 6(3), 7–10 (1993)
Article MathSciNet MATH Google Scholar
Kainen, P., Kůrková, V.: Functionally equivalent feedforward neural network. Neural Comput. 6(3), 543–558 (1994)
Article Google Scholar
Kainen, P., Kůrková, V.: Singularities of finite scaling functions. Appl. Math. Lett. 9(2), 33–37 (1996)
Article MathSciNet MATH Google Scholar
Kainen, P.C., Kůrková, V., Vogt, A.: Approximation by neural networks is not continuous. Neurocomputing 29, 47–56 (1999)
Article Google Scholar
Kainen, P.C., Kůrková, V., Vogt, A.: Geometry and topology of continuous best and near best approximations. J. Approx. Theory 105, 252–262 (2000)
Article MathSciNet MATH Google Scholar
Kainen, P.C., Kůrková, V., Vogt, A.: Continuity of approximation by neural networks in \({L}_p\)-spaces. Ann. Oper. Res. 101, 143–147 (2001)
Article MathSciNet MATH Google Scholar
Kecman, V.: Learning and Soft Computing. MIT Press, Cambridge (2001)
MATH Google Scholar
Kolmogorov, A.: Asymptotic characteristics of some completely bounded metric spaces. Dokl. Akad. Nauk. SSSR 108, 585–589 (1956)
Google Scholar
Kůrková, V., Sanguineti, M.: Classification by sparse neural networks. IEEE Trans. Neural Netw. Learn. Syst. 30(9), 2746–2754 (2019)
Google Scholar
Kůrková, V.: Dimension-independent rates of approximation by neural networks. In: Warwick, K., Kárný, M. (eds.) Computer-Intensive Methods in Control and Signal Processing. The Curse of Dimensionality, pp. 261–270. Birkhäuser, Boston (1997)
Google Scholar
Kůrková, V.: High-dimensional approximation and optimization by neural networks. In: Suykens, J. et al. (eds.) Advances in Learning Theory: Methods, Models, and Applications (NATO Science Series III: Computer & Systems Sciences, vol. 190), pp. 69–88. IOS Press, Amsterdam (2003)
Google Scholar
Kůrková, V.: Sparsity and complexity of networks computing highly-varying functions. In: International Conference on Artificial Neural Networks, pp. 534–543 (2018)
Google Scholar
Kůrková, V.: Complexity estimates based on integral transforms induced by computational units. Neural Netw. 33, 160–167 (2012)
Article MATH Google Scholar
Kůrková, V.: Constructive lower bounds on model complexity of shallow perceptron networks. Neural Comput. Appl. 29, 305–315 (2018)
Article Google Scholar
Kůrková, V., Kainen, P.C.: Comparing fixed and variable-width Gaussian networks. Neural Netw. 57(10), 23–28 (2014)
Article MATH Google Scholar
Kůrková, V., Sanguineti, M.: Model complexities of shallow networks representing highly varying functions. Neurocomputing 171, 598–604 (2016)
Article Google Scholar
LeCun, Y., et al.: Handwritten digit recognition with a back-propagation network. In: Proceedings of Advances in Neural Information Processing Systems, pp. 396–404 (1990)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
Lévy, P.: Problèmes concrets d’analyse fonctionelle. Gauthier Villards, Paris (1951)
Google Scholar
MacWilliams, F., Sloane, N.A.: The Theory of Error-Correcting Codes. North Holland Publishing Co., Amsterdam (1977)
Google Scholar
Maiorov, V.E., Meir, R.: On the near optimality of the stochastic approximation of smooth functions by neural networks. Adv. Comput. Math. 13, 79–103 (2000)
Article MathSciNet MATH Google Scholar
Maiorov, V.E., Pinkus, A.: Lower bounds for approximation by MLP neural networks. Neurocomputing 25, 81–91 (1999)
Article MATH Google Scholar
Makovoz, Y.: Random approximants and neural networks. J. Approx. Theory 85, 98–109 (1996)
Article MathSciNet MATH Google Scholar
Matoušek, J.: Lectures on Discrete Geometry. Springer, New York (2002)
Book MATH Google Scholar
Mhaskar, H.N., Liao, Q., Poggio, T.: Learning functions: when is deep better than shallow. Center for Brains, Minds & Machines, pp. 1–12 (2016)
Google Scholar
Oneto, L., Ridella, S., Anguita, D.: Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Mach. Learn. 103(1), 103–136 (2015)
Article MathSciNet MATH Google Scholar
Plan, Y., Vershynin, R.: One-bit compressed sensing by linear programming. Commun. Pure Appl. Math. 66, 1275–1297 (2013)
Article MathSciNet MATH Google Scholar
Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., Liao, Q.: Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. Int. J. Autom. Comput. 14 (5), 503–519 (2017). https://doi.org/10.1007/s11633-017-1054-2
Schläfli, L.: Gesamelte Mathematische Abhandlungen, vol. 1. Birkhäuser, Basel (1950)
Book Google Scholar
Schröder, M.: Number Theory in Science and Communication. Springer, New York (2009)
Google Scholar
Sloane, N.A.: A library of Hadamard matrices. http://www.research.att.com/~njas/hadamard/
Sussman, H.J.: Uniqueness of the weights for minimal feedforward nets with a given input-output map. Neural Netw. 5(4), 589–593 (1992)
Article Google Scholar
Sylvester, J.J.: Thoughts on inverse orthogonal matrices, simultaneous sign successions, and tessellated pavements in two or more colours, with applications to Newton’s rule, ornamental tile-work, and the theory of numbers. Philos. Mag. 34, 461–475 (1867)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Tillmann, A.: On the computational intractability of exact and approximate dictionary learning. IEEE Signal Process. Lett. 22, 45–49 (2015)
Article Google Scholar
Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(67), (1997)
Google Scholar

Download references

Acknowledgements

This work was partially supported by the Czech Grant Foundation grant GA18-23827S and institutional support of the Institute of Computer Science RVO 67985807.

Author information

Authors and Affiliations

Institute of Computer Science of the Czech Academy of Sciences, Pod Vodárenskou věží 2, 182 07, Prague, Czech Republic
Věra Kůrková

Authors

Věra Kůrková
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Věra Kůrková .

Editor information

Editors and Affiliations

DIBRIS, University of Genoa, Genova, Italy
Luca Oneto
Department of Mathematics “Tullio Levi-Civita”, University of Padua, Padova, Italy
Nicolò Navarin
Department of Mathematics “Tullio Levi-Civita”, University of Padua, Padova, Italy
Alessandro Sperduti
DIBRIS, Università degli Studi di Genova, Genova, Italy
Davide Anguita

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Cite this chapter

Kůrková, V. (2020). Limitations of Shallow Networks. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds) Recent Trends in Learning From Data. Studies in Computational Intelligence, vol 896. Springer, Cham. https://doi.org/10.1007/978-3-030-43883-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-43883-8_6
Published: 04 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43882-1
Online ISBN: 978-3-030-43883-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions