Essential rate for approximation by spherical neural networks
Introduction
In the -dimensional Euclidean space , feed-forward neural networks (FNNs) have attracted the attention of large number of scholars for their universal approximation property. There are two main problems concerning the research of FNN approximation. The first one is called density, which deals with deciding whether it is possible to approximate the target function arbitrarily well by choosing suitable network models. The typical results can be found in Chen and Chen (1995), Chui and Li (1992), Cybenko (1989), Funahashi (1989), Hornik, Stinchcombe, and White (1990), Leshno, Lin, Pinks, and Schocken (1993) and Park and Sandberg, 1991, Park and Sandberg, 1993 and so on.
The other problem of such approximation called complexity is to determine how many neurons are necessary to yield a prescribed degree of approximation, which mainly describes the relationship among the topology structure of hidden layers, the approximation ability and the approximation rate. There have been many studies for this problem. We refer the readers to Barron (1993), Bulsari (1993), Ferrari and Stengel (2005), Korain (1993), Maiorov and Meir (1998), Makovoz (1998), Mhaskar and Micchelli (1995), Suzuki (1998) and Xu and Cao (2004).
Rates of approximation describe the trade-off between the accuracy of approximation and the complexity of approximating functions. When such functions belong to a parameterized family, their complexity can be measured by the lengths of parameter vectors (depending on the number of variables on the degree of polynomials, or on the number of hidden units in neural networks, etc.). The comparison of rates of approximation between polynomials and FNNs have been studied by several authors. For example, in the previous paper (Cao, Lin, & Xu, 2010), we proved that if the activation function of FNNs is analytic and non-polynomial, then the approximation rate of FNNs is not lower than that of the polynomial. On the other hand, Konovalov, Leviatan, and Maiorov (2008) proved that if the target function is radial, then the approximation rate of algebraic polynomials is not slower than that of FNNs in the square integrable function space (indeed, Konovalov et al., 2008, proved this property for any ridge function manifolds). Similar results can be found in Maiorov and Pinkus (1999), Mhaskar (1996), Petrushev (1999) and Xie and Cao (2010) and references therein.
In order to reflect the approximation capability of FNNs more precisely, it is natural to raise the question: what about the lower bound of approximation? As regards to this question, there have been some papers such as Konovalov et al., 2008, Konovalov et al., 2009, Maiorov, 1999, Maiorov, 2003 and Xu and Cao (2004) etc. dealing with the lower bound for approximation by FNNs with various activation functions and target functions. If the upper and lower bounds are asymptotically identical, then we call the degree of the bounds as the essential rate of approximation.
On the other hand, many applications such as geophysics, metrology, graph rendering and so on, the data are usually collected over a sphere or sphere-like area. One then seeks to find a functional model for the mechanism that generates the data. For example, the mathematical models of some satellite missions such as GOCE and CHAMP, studying the gravity potential of the earth, need to solve spherical Fredholm integral equations of the first kind. Hence, find a tool which can deal with spherical data by using some special properties of the sphere becomes more and more important.
A feasible tool for dealing with spherical data is the spherical polynomials (SPs). The direct and inverse approximation theorem of SPs have been studied by several scholars by using some well-known spherical polynomial operators: Lizorkin and Nikol’skiĭ (1983) for spherical Jackson operator; Mhaskar, Narcowich, and Ward (1999), for spherical delay means operator; Wang and Li (2000) for spherical de la Vallée Poussin operator; Dai and Ditzian (2008), for the best approximation operator etc.
A major problem of approximation by SPs is the so-called curse of dimensionality, whereby performance degrades rapidly as the dimensionality of the problem increases. Several procedures have been suggested in order to circumvent this problem. A typical approach on the sphere is the zonal function networks (ZFNs) formed as where the weights are the site of scattered spherical data, and denotes the inner product of dimensional vectors and . In the seminal paper (Sun & Cheney, 1997), the sufficient and necessary conditions for the density of ZFNs have been deduced. Two years later, Mhaskar et al. (1999) established the complexity of approximation by ZFNs. They compared the rate of approximation of ZFNs with that of SPs. They proved that if the activation functions of the ZFNs satisfy some conditions (such as Gaussian function), then the upper rate of approximation by ZFNs and SPs are identical when the neurons of ZFNs and the degree of SPs satisfy , i.e. they proved that the rate of approximation by ZFNs with Gaussian activation function in a Sobolev (which will be defined in Section 2) is . For general target functions and activation functions, Mhaskar et al. (1999) used the summation of the best approximation of SPs and a redundancy depending on the smoothness of the activation functions to bound the best approximation of ZFNs. Some studies for approximation by ZFNs on the sphere can also be found in Mhaskar (2006), Mhaskar, Narcowich, and Ward (2003) and Narcowich, Sun, Ward, and Wendland (2007).
In this paper, by using the traditional idea of neural networks, we introduce a new approximant on the sphere called spherical neural networks (SNNs) formed as where . We denote by the collection of all functions formed as (1.2). It is obvious that ZFN is a special type of SNN (by setting the thresholds to 0 and restricting the inner weight to the sphere). Thus results about ZFNs are automatically results about SNNs. Our main idea of introducing SNNs is that by adding thresholds to the ZFNs, we can essentially improve the rate of approximation. More precisely, by using SNNs, we can deduce a similar result as that of ZFNs by using much less neurons. Indeed, it will be shown in Section 3 that if , then there exists an SNN with analytic, strictly increasing and sigmoidal activation function such that the upper bound of approximation is not larger than that of SPs. Therefore, the upper bound of approximation by SPs can deduce the upper bound of approximation by SNNs. For example, if , then the approximation rate of SNNs is , which is better than that of ZFNs.
The other work of this paper is to study the lower bound of approximation by SNNs. By help of a lemma proved by Maiorov (1999) and the Funk–Hecke formula, we will prove that for arbitrary , the lower rate of approximation by SNNs also asymptotically behaves as .
The rest of this paper is organized as follows. In the next section, we will give some preliminaries about the classical spherical polynomials. The upper bound of approximation by SNNs will be proved in Section 3, where the relation between approximation by SNNs and SPs will be also given. The lower bound of approximation by SNNs will be shown in Section 4, while in the last section, we will give some remarks.
To aid our description, we adopt the following convention regarding symbols. Let be constants depending only on , whose values will be different at different occurrences, even within the same formula. The symbol means . The volume of is denoted by , and it is easy to deduce that
Section snippets
Notations and preliminaries
At first, we introduce a Sobolev space on the sphere.
Consider the Hilbert space with norm and inner product where is the elementary surface piece on , The Laplace–Beltrami operator is defined by (see Freeden et al., 1998, Müller, 1966, Wang and Li, 2000) For every positive integer , we denote by the class of functions for which ,
Upper bound of approximation
Let be the set of continuous functions on . In this section, we prove that for any and any , there exists an SNN, , with analytic, strictly increasing and sigmoidal activation function and neurons such that where denotes the uniform norm on .
The following Lemma 3.1 proved by Maiorov and Pinkus (1999) will play a crucial role in our proof. Lemma 3.1 There exists a function which is real analytic, strictly increasing, and sigmoidal
Lower bound of approximation
In this section, motivated by Maiorov (1999), we prove that the upper and lower rates of approximation by SNNs are identical, which behave asymptotically as .
Let the vector set consisting of all vectors with coordinates , i.e., Let and be natural numbers. Let be any algebraic polynomials with real coefficients in the variables , each of degree . Construct the polynomials in the
Conclusions and remarks
In Section 3, we got the upper bound of approximation by SNNs with analytic, strictly increasing, sigmoidal activation function. In Section 4 we also deduce the lower bound of approximation by SNNs with square integrable activation function. Combining these we obtain the following Theorem 5.1. Theorem 5.1 If , then there exists a being analytic, strictly increasing, sigmoidal and square Lebesgue integrable such that
Acknowledgments
The authors wish to thank the referees for their helpful suggestions. The research was supported by the National 973 Project (2007CB311002) and the National Natural Science Foundation of China (Nos. 90818020, 60873206).
References (42)
Some analytical solutions to the general approximation problem for feedforward neural networks
Neural Networks
(1993)- et al.
Approximation by ridge functions and neural networks with one hidden layer
Journal of Approximation Theory
(1992) On the approximate realization of continuous mapping by neural networks
Neural Networks
(1989)- et al.
Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks
Neural Networks
(1990) - et al.
Approximation by polynomials and ridge functions of the classes of -monotone radial functions
Journal of Approximation Theory
(2008) - et al.
Approximation of Sobolev classes by polynomials and ridge functions
Journal of Approximation Theory
(2009) - et al.
Multilayer feedforward networks with a nonpolynomial activation function can approximat any function
Neural Networks
(1993) On best approximation by ridge functions
Journal of Approximation Theory
(1999)On best approximation of classes by radial functions
Journal of Approximation Theory
(2003)- et al.
Lower bounds for approximation by MLP neural networks
Neurocomputing
(1999)
Uniform approximation by nerual networks
Journal of Approximation Theory
Weighted quadrature formulas and approximation by zonal function networks on the sphere
Journal of Complexity
Degree of approximation by neural networks with a single hidden layer
Advances in Applied Mathematics
Zonal function network frames on the sphere
Neural Networks
Constructive function approximation by three-layer artificial neural networks
Neural Networks
The errors in simultaneous approximation by feed-forward neural networks
Neurocomputing
Universal approximation bounds for superpositions of a sigmoidal function
IEEE Transactions on Information Theory
Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical system
IEEE Transactions on Neural Networks
Approximation by superpositions of sigmoidal function
Mathematics of Control, Signals, and Systems
Jackson inequality for Banach spaces on the sphere
Acta Mathematica Hungarica
Cited by (13)
Relaxed conditions for convergence analysis of online back-propagation algorithm with L<inf>2</inf> regularizer for Sigma-Pi-Sigma neural network
2018, NeurocomputingCitation Excerpt :For example, the deterministic convergence for the networks was deliberated in [4]. Zhang et al. [5] discussed the online gradient method with penalty term, in which the patterns are presented in a stochastic ordered sequence. [6] settled down the convergence analysis issue with any analytic sigmoid activation function.
Limitations of shallow nets approximation
2017, Neural NetworksCitation Excerpt :The main reason is that there lack comprehensive studies on the limitations of shallow nets, which makes it difficult to quantify the difference of approximation abilities between deep and shallow nets. To be detailed, the existing results (Chui et al., 1996; Lin, Cao, & Xu, 2011; Maiorov, 1999, 2003, 2005) concerning the lower bounds of shallow nets approximation were built upon the minimax sense in terms of constructing some bad functions in a class of functions to achieve the worst approximation rates. If the measure of the set of these bad functions is small, then the minimax lower bound is difficult to reflect limitations of shallow nets.
Linear and nonlinear approximation of spherical radial basis function networks
2016, Journal of ComplexitySimultaneous approximation by spherical neural networks
2015, NeurocomputingCitation Excerpt :In the last sections, we will give the proofs of the main results. In order to reveal the simultaneous approximation capability of SNNs, we need the following representation theorem, which was proven in [23]. It was mentioned above that the Laplace–Beltrami operator plays an important role in spherical harmonics.
Jackson-type inequalities for spherical neural networks with doubling weights
2015, Neural NetworksCitation Excerpt :In this section, we aim to construct a type of well-localized SNNs and study their approximation capabilities. The topological structures of SBFs, the SNNs proposed in Lin et al. (2011) and the SNNs (3.3) are illustrated in Fig. 1. It can be seen from Fig. 1(a) that SBFs are special SNNs without thresholds whose neurons are set to be the spherical data.
Constructive function approximation by neural networks with optimized activation functions and fixed weights
2019, Neural Computing and Applications