Elsevier

Neural Networks

Volume 24, Issue 7, September 2011, Pages 752-758
Neural Networks

Essential rate for approximation by spherical neural networks

https://doi.org/10.1016/j.neunet.2011.04.005Get rights and content

Abstract

We consider the optimal rate of approximation by single hidden feed-forward neural networks on the unit sphere. It is proved that there exists a neural network with n neurons, and an analytic, strictly increasing, sigmoidal activation function such that the deviation of a Sobolev class W2r2(Sd) from the class of neural networks Φnϕ, behaves asymptotically as n2rd1. Namely, we prove that the essential rate of approximation by spherical neural networks is n2rd1.

Introduction

In the (d+1)-dimensional Euclidean space Rd+1, feed-forward neural networks (FNNs) have attracted the attention of large number of scholars for their universal approximation property. There are two main problems concerning the research of FNN approximation. The first one is called density, which deals with deciding whether it is possible to approximate the target function arbitrarily well by choosing suitable network models. The typical results can be found in Chen and Chen (1995), Chui and Li (1992), Cybenko (1989), Funahashi (1989), Hornik, Stinchcombe, and White (1990), Leshno, Lin, Pinks, and Schocken (1993) and Park and Sandberg, 1991, Park and Sandberg, 1993 and so on.

The other problem of such approximation called complexity is to determine how many neurons are necessary to yield a prescribed degree of approximation, which mainly describes the relationship among the topology structure of hidden layers, the approximation ability and the approximation rate. There have been many studies for this problem. We refer the readers to Barron (1993), Bulsari (1993), Ferrari and Stengel (2005), Korain (1993), Maiorov and Meir (1998), Makovoz (1998), Mhaskar and Micchelli (1995), Suzuki (1998) and Xu and Cao (2004).

Rates of approximation describe the trade-off between the accuracy of approximation and the complexity of approximating functions. When such functions belong to a parameterized family, their complexity can be measured by the lengths of parameter vectors (depending on the number of variables on the degree of polynomials, or on the number of hidden units in neural networks, etc.). The comparison of rates of approximation between polynomials and FNNs have been studied by several authors. For example, in the previous paper (Cao, Lin, & Xu, 2010), we proved that if the activation function of FNNs is analytic and non-polynomial, then the approximation rate of FNNs is not lower than that of the polynomial. On the other hand, Konovalov, Leviatan, and Maiorov (2008) proved that if the target function is radial, then the approximation rate of algebraic polynomials is not slower than that of FNNs in the square integrable function space (indeed, Konovalov et al., 2008, proved this property for any ridge function manifolds). Similar results can be found in Maiorov and Pinkus (1999), Mhaskar (1996), Petrushev (1999) and Xie and Cao (2010) and references therein.

In order to reflect the approximation capability of FNNs more precisely, it is natural to raise the question: what about the lower bound of approximation? As regards to this question, there have been some papers such as Konovalov et al., 2008, Konovalov et al., 2009, Maiorov, 1999, Maiorov, 2003 and Xu and Cao (2004) etc. dealing with the lower bound for approximation by FNNs with various activation functions and target functions. If the upper and lower bounds are asymptotically identical, then we call the degree of the bounds as the essential rate of approximation.

On the other hand, many applications such as geophysics, metrology, graph rendering and so on, the data are usually collected over a sphere or sphere-like area. One then seeks to find a functional model for the mechanism that generates the data. For example, the mathematical models of some satellite missions such as GOCE and CHAMP, studying the gravity potential of the earth, need to solve spherical Fredholm integral equations of the first kind. Hence, find a tool which can deal with spherical data by using some special properties of the sphere becomes more and more important.

A feasible tool for dealing with spherical data is the spherical polynomials (SPs). The direct and inverse approximation theorem of SPs have been studied by several scholars by using some well-known spherical polynomial operators: Lizorkin and Nikol’skiĭ (1983) for spherical Jackson operator; Mhaskar, Narcowich, and Ward (1999), for spherical delay means operator; Wang and Li (2000) for spherical de la Vallée Poussin operator; Dai and Ditzian (2008), for the best approximation operator etc.

A major problem of approximation by SPs is the so-called curse of dimensionality, whereby performance degrades rapidly as the dimensionality of the problem increases. Several procedures have been suggested in order to circumvent this problem. A typical approach on the sphere is the zonal function networks (ZFNs) formed as xk=1nakϕ(ξk,x), where the weights ξk are the site of scattered spherical data, and x,y denotes the inner product of (d+1) dimensional vectors x and y. In the seminal paper (Sun & Cheney, 1997), the sufficient and necessary conditions for the density of ZFNs have been deduced. Two years later, Mhaskar et al. (1999) established the complexity of approximation by ZFNs. They compared the rate of approximation of ZFNs with that of SPs. They proved that if the activation functions of the ZFNs satisfy some conditions (such as Gaussian function), then the upper rate of approximation by ZFNs and SPs are identical when the neurons of ZFNs n and the degree of SPs s satisfy nsd, i.e. they proved that the rate of approximation by ZFNs with Gaussian activation function in a Sobolev W2r2 (which will be defined in Section 2) is O(n2rd). For general target functions and activation functions, Mhaskar et al. (1999) used the summation of the best approximation of SPs and a redundancy depending on the smoothness of the activation functions to bound the best approximation of ZFNs. Some studies for approximation by ZFNs on the sphere can also be found in Mhaskar (2006), Mhaskar, Narcowich, and Ward (2003) and Narcowich, Sun, Ward, and Wendland (2007).

In this paper, by using the traditional idea of neural networks, we introduce a new approximant on the sphere called spherical neural networks (SNNs) formed as Nϕ,n(x)i=1nciϕ(wi,x+θi),xSd, where wiRd+1,θi,ciR. We denote by Φϕ,n the collection of all functions formed as (1.2). It is obvious that ZFN is a special type of SNN (by setting the thresholds to 0 and restricting the inner weight to the sphere). Thus results about ZFNs are automatically results about SNNs. Our main idea of introducing SNNs is that by adding thresholds to the ZFNs, we can essentially improve the rate of approximation. More precisely, by using SNNs, we can deduce a similar result as that of ZFNs by using much less neurons. Indeed, it will be shown in Section 3 that if nsd1, then there exists an SNN with analytic, strictly increasing and sigmoidal activation function such that the upper bound of approximation is not larger than that of SPs. Therefore, the upper bound of approximation by SPs can deduce the upper bound of approximation by SNNs. For example, if fW2r2, then the approximation rate of SNNs is O(n2rd1), which is better than that of ZFNs.

The other work of this paper is to study the lower bound of approximation by SNNs. By help of a lemma proved by Maiorov (1999) and the Funk–Hecke formula, we will prove that for arbitrary fW2r2, the lower rate of approximation by SNNs also asymptotically behaves as n2rd1.

The rest of this paper is organized as follows. In the next section, we will give some preliminaries about the classical spherical polynomials. The upper bound of approximation by SNNs will be proved in Section 3, where the relation between approximation by SNNs and SPs will be also given. The lower bound of approximation by SNNs will be shown in Section 4, while in the last section, we will give some remarks.

To aid our description, we adopt the following convention regarding symbols. Let C,C1,C2, be constants depending only on d, whose values will be different at different occurrences, even within the same formula. The symbol AB means CABC1A. The volume of Sd is denoted by Ωd, and it is easy to deduce that ΩdSddω=2πd+12Γ(d+12).

Section snippets

Notations and preliminaries

At first, we introduce a Sobolev space on the sphere.

Consider the Hilbert space L2(Sd) with norm f2fL2(Sd)(Sd|f(x)|2dω(x))1/2 and inner product f,g2Sdf(x)g(x)¯dω(x), where dω(x) is the elementary surface piece on Sd, The Laplace–Beltrami operator Δ is defined by (see Freeden et al., 1998, Müller, 1966, Wang and Li, 2000) Δfi=1d+12g(x)xi2||x|(x12+x22++xd+12)1/2=1,g(x)=f(x|x|). For every positive integer r, we denote by H2r2(Sd) the class of functions f for which ΔrfL2(Sd),

Upper bound of approximation

Let C(Sd) be the set of continuous functions on Sd. In this section, we prove that for any fC(Sd) and any ε>0, there exists an SNN, Nϕ,n, with analytic, strictly increasing and sigmoidal activation function and nsd1 neurons such that fNϕ,nCdist(f,Πsd,C(Sd))+ε, where denotes the uniform norm on Sd.

The following Lemma 3.1 proved by Maiorov and Pinkus (1999) will play a crucial role in our proof.

Lemma 3.1

There exists a function ϕ which is real analytic, strictly increasing, and sigmoidal

Lower bound of approximation

In this section, motivated by Maiorov (1999), we prove that the upper and lower rates of approximation by SNNs are identical, which behave asymptotically as n2rd1.

Let the vector set Em consisting of all vectors ε(ε1,,εm),mN with coordinates ε1,,εm=±1, i.e., Em{ε=(ε1,,εm):εi=±1,i=1,2,,m}. Let m,s,p and q be natural numbers. Let πij(σ),i=1,,m;j=1,,q be any algebraic polynomials with real coefficients in the variables σ=(σ1,,σp)Rp, each of degree s. Construct the polynomials in the p+q

Conclusions and remarks

In Section 3, we got the upper bound of approximation by SNNs with analytic, strictly increasing, sigmoidal activation function. In Section 4 we also deduce the lower bound of approximation by SNNs with square integrable activation function. Combining these we obtain the following Theorem 5.1.

Theorem 5.1

If nsd1, then there exists a ϕ:RR being analytic, strictly increasing, sigmoidal and square Lebesgue integrable such thatdist(W2r2,Φϕ,n,L2(Sd))dist(W2r2,Πsd,L2(Sd))n2rd1.

From Theorem 5.1 it

Acknowledgments

The authors wish to thank the referees for their helpful suggestions. The research was supported by the National 973 Project (2007CB311002) and the National Natural Science Foundation of China (Nos. 90818020, 60873206).

References (42)

  • Y. Makovoz

    Uniform approximation by nerual networks

    Journal of Approximation Theory

    (1998)
  • H.N. Mhaskar

    Weighted quadrature formulas and approximation by zonal function networks on the sphere

    Journal of Complexity

    (2006)
  • H.N. Mhaskar et al.

    Degree of approximation by neural networks with a single hidden layer

    Advances in Applied Mathematics

    (1995)
  • H.N. Mhaskar et al.

    Zonal function network frames on the sphere

    Neural Networks

    (2003)
  • S. Suzuki

    Constructive function approximation by three-layer artificial neural networks

    Neural Networks

    (1998)
  • T.F. Xie et al.

    The errors in simultaneous approximation by feed-forward neural networks

    Neurocomputing

    (2010)
  • A.R. Barron

    Universal approximation bounds for superpositions of a sigmoidal function

    IEEE Transactions on Information Theory

    (1993)
  • Cao, F. L., Lin, S. B., & Xu, Z. B. (2010). The best approximation for polynomials and neural networks on the unit...
  • T.P. Chen et al.

    Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical system

    IEEE Transactions on Neural Networks

    (1995)
  • G. Cybenko

    Approximation by superpositions of sigmoidal function

    Mathematics of Control, Signals, and Systems

    (1989)
  • F. Dai et al.

    Jackson inequality for Banach spaces on the sphere

    Acta Mathematica Hungarica

    (2008)
  • Cited by (13)

    • Relaxed conditions for convergence analysis of online back-propagation algorithm with L<inf>2</inf> regularizer for Sigma-Pi-Sigma neural network

      2018, Neurocomputing
      Citation Excerpt :

      For example, the deterministic convergence for the networks was deliberated in [4]. Zhang et al. [5] discussed the online gradient method with penalty term, in which the patterns are presented in a stochastic ordered sequence. [6] settled down the convergence analysis issue with any analytic sigmoid activation function.

    • Limitations of shallow nets approximation

      2017, Neural Networks
      Citation Excerpt :

      The main reason is that there lack comprehensive studies on the limitations of shallow nets, which makes it difficult to quantify the difference of approximation abilities between deep and shallow nets. To be detailed, the existing results (Chui et al., 1996; Lin, Cao, & Xu, 2011; Maiorov, 1999, 2003, 2005) concerning the lower bounds of shallow nets approximation were built upon the minimax sense in terms of constructing some bad functions in a class of functions to achieve the worst approximation rates. If the measure of the set of these bad functions is small, then the minimax lower bound is difficult to reflect limitations of shallow nets.

    • Simultaneous approximation by spherical neural networks

      2015, Neurocomputing
      Citation Excerpt :

      In the last sections, we will give the proofs of the main results. In order to reveal the simultaneous approximation capability of SNNs, we need the following representation theorem, which was proven in [23]. It was mentioned above that the Laplace–Beltrami operator plays an important role in spherical harmonics.

    • Jackson-type inequalities for spherical neural networks with doubling weights

      2015, Neural Networks
      Citation Excerpt :

      In this section, we aim to construct a type of well-localized SNNs and study their approximation capabilities. The topological structures of SBFs, the SNNs proposed in Lin et al. (2011) and the SNNs (3.3) are illustrated in Fig. 1. It can be seen from Fig. 1(a) that SBFs are special SNNs without thresholds whose neurons are set to be the spherical data.

    View all citing articles on Scopus
    View full text