Abstract
The investigation of neural network convergence represents a pivotal and indispensable area of research, as it plays a crucial role in unraveling the universal approximation capability and the intricate structural complexity inherent in these systems. In this study, we delve into an innovative and generalized convex incremental iteration method, which surpasses previous studies by offering a more expansive formulation capable of encompassing a broader range of weight parameters. Moreover, we rigorously and systematically demonstrate the convergence rate of this convex iteration technique, shedding light on its reliability and effectiveness. Furthermore, we adopt a discrete statistical perspective to effectively tackle the challenges arising from the non-compactness of input data and the inherent unknowability of the objective function in practical settings, thereby enhancing the robustness and applicability of our research. To support our conclusions, we introduce two implementation algorithms, namely back propagation and random search. The latter algorithm plays a vital role in preventing the neural network from becoming entrapped in suboptimal local minima during the training process. Finally, we present comprehensive results obtained from a variety of regression problems, which not only serve as empirical evidence of the superior performance of our algorithms but also validate their alignment with our theoretical predictions. These results contribute significantly to the advancement of our understanding of neural network convergence and its profound implications for the universal approximation capability inherent in these complex systems.
Similar content being viewed by others
Data Availability
Datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS
Simonyan K, Zisserman A (2012) Very deep convolutional networks for large-scale image recognition. In: Computer science
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2:303–314
Ito Y (1991) Approximation of functions on a compact set by finite sums of a sigmoid function without scaling. Neural Netw 4:817–826
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4:251–257
Hornik K (1993) Some new results on neural network approximation. Neural Netw 6:1069–1072
Leshno M, Lin VY, Pinkus A, Schocken S (1993) Multilayer feedforward networks with a nonpolynomial activation function can approxiamate any function. Neural Netw 6:861–867
Chen T, Chen H, Liu R-W (1995) Approximation capability in \(C(\overline{ R}^n)\) by multilayer feedforward networks and related problems. IEEE Trans Neural Netw 6(1):25–30
Chen T, Chen H (1995) Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks. IEEE Trans Neural Netw 6(4):904–910
Lee WS, Bartlett PL, Williamson RC (1996) Efficient agnostic learning of neural networks with bounded fan-in. IEEE Trans Inf Theory 42(6):2118–2132
Maiorov V, Pinkus A (1999) Lower bounds for approximation by MLP neural networks. Neurocomputing 25:81–91
Meir R, Maiorov VE (2000) On the optimality of neural-network approximation using incremental algorithms. IEEE Trans Neural Netw 11(2):323–337
Lavretsky E (2002) On the geometric convergence of neural approximations. IEEE Trans Neural Netw 13(2):274–282
Xiang C, Shenqiang Lee TH (2005) Geometrical interpretation and architecture selection of MLP. IEEE Trans Neural Netw 16(1):84–96
Huang G-B, Chen L, Siew C-K (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Huang G-B, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70:3056–3062
Lee J, Xiao L, Schoenholz S, Bahri Y, Novak R, Sohl-Dickstein J, Pennington J (2019) Wide neural networks of any depth evolve as linear models under gradient descent. In: NeurIPS
Jones LK (1992) A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural networks. Ann Stat 20(1):608–613
Barron AR (1993) Universal approximation bounds for superpositions of a sigmoid function. IEEE Trans Inf Theory 39(3):930–945
Koiran P (1994) Efficient learning of continuous neural networks. In: Proceedings of the 7th annual ACM conference on computational learning theory, pp 348–355
Donahue MJ, Gurvits L, Darken C, Sontag E (1997) Rates of convex approximation in non-Hilbert spaces. Constr Approx 13:187–220
Kwok T-Y, Yeung D-Y (1997) Objective functions for training new hidden units in constructive neural networks. IEEE Trans Neural Netw 8(5):1131–1148
Romero E (2002) A new incremental method for function approximation using feed-forward neural networks. In: Proceedings of INNS-IEEE international joint conference on neural networks (IJCNN’2002), pp 1968–1973
Chen L, Huang G-B, Pung HK (2009) Systemical convergence rate analysis of convex incremental feedforward neural networks. Neurocomputing 72:2627–2635
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theorey and applications. Neurocomputing 70:489–501
Huang G-B, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71:3460–3468
Zhang R, Lan Y, Huang G-B, Xu Z (2012) Universal approximation of extreme learning machine with adaptive growth of hidden nodes. IEEE Trans Neural Netw Learn Syst 23(2):365–371
Zhang R, Lan Y, Huang G-B, Xu Z, Soh YC (2013) Dynamic extreme learning machine and its approximation capability. IEEE Trans Cybern 43(6):2054–2065
Elbrächter D, Perekrestenko D, Grohs P, Bölcskei H (2021) Deep neural network approximation theory. IEEE Trans Inf Theory 67(5):2581–2623
Lu J, Shen Z, Yang H, Zhang S (2021) Deep network approximation for smooth functions. J Math Anal 53(5):5465–5506
Mhaskar H, Poggio T (2016) Deep vs. shallow networks: an approximation theory perspective. Anal Appl 14(6):829–848
Shen Z, Yang H, Zhang S (2022) Optimal approximation rate of relu networks in terms of width and depth. Journal de Mathématiques Pures et Appliquées 157:101–135
Yarotsky D (2018) Optimal approximation of continuous functions by very deep relu networks. In: 31st Annual conference on learning theory, pp 639–649
Yarotsky D, Zhevnerchuk A (2020) The phase diagram of approximation rates for deep neural networks. In: NIPS
Xu Y, Zhang H (2022) Convergence of deep convolutional neural networks. Neural Netw 153:553–563
Kaminski W, Strumillo P (1997) Kernel orthonormalization in radial basis function neural networks. IEEE Trans Neural Netw 8(5):1177–1183
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors have declared that: (i) no support, financial or otherwise, has been received from any organization that may have an interest in the submitted work; and (ii) there are no other relationships or activities that could appear to have influenced the submitted work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, L., Wang, Y., Zhang, L. et al. The Convergence of Incremental Neural Networks. Neural Process Lett 55, 12481–12499 (2023). https://doi.org/10.1007/s11063-023-11429-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11429-4