Elsevier

Neural Networks

Volume 10, Issue 8, November 1997, Pages 1455-1463
Neural Networks

CONTRIBUTED ARTICLE
Some numerical aspects of the training problem for feed-forward neural nets

https://doi.org/10.1016/S0893-6080(97)00015-4Get rights and content

Abstract

This paper considers the feed-forward training problem from the numerical point of view, in particular the conditioning of the problem. It is well known that the feed-forward training problem is often ill-conditioned; this affects the behaviour of training algorithms, the choice of such algorithms and the quality of the solutions achieved. A geometric interpretation of ill-conditioning is explored and an example of function approximation is analysed in detail.

Introduction

It is now well established that feed-forward neural networks can be used as universal approximators. The training of such networks in order to approximate functions is often formulated as a nonlinear least-squares problem, numerically equivalent to nonlinear regression. A major body of optimisation techniques is therefore available to analyse and solve it. An important factor in the performance of such methods, as well as the confidence which can be placed in the solutions they obtain, is the numerical conditioning of the error function. Ill-conditioning may conveniently be thought of as the situation in which the value of a function of several variables is locally very much more sensitive to changes in the values of certain combinations of its variables than it is to other variations. Although some authors have discussed aspects of this problem, the implications have perhaps not been widely realised. As evidence of this, examples can be found in the recent literature of problems formulated in such a way that extreme ill-conditioning is guaranteed by the inadequate amount of training data used. Even when this is not the case, ill-conditioning is definitely a common feature of the training problem and cannot be ignored (Dixon and Mills, 1991; Saarinen et al., 1991; Ellacott, 1993). The paper by Saarinen et al., in particular, covers similar ground to the present one in considerable detail but without the use of a geometrical interpretation of ill-conditioning; it does not seem to have been as widely appreciated as it might have been. It is hoped that the present discussion will contribute to the debate by clarifying some of these issues, particularly by a detailed analysis of an instance of a functional approximation problem.

Section snippets

Summary of the nonlinear least-squares problem

The supervised training problem for feed-forward artificial neural networks can be formulated as one of minimising, as a function of the weights, the sum of squares of differences between the predicted and training outputs. The number of variables (n) is usually equal to the sum of the number of connections and neurons in the network, while the number (m) of differences, or residual errors, is equal to the product of the number of outputs from the network (o) and the number of training sets (p

Feed-forward training problems are ill-conditioned

It has been observed that the feed-forward training problem is almost invariably ill-conditioned or singular. In this respect it is far from being unique among nonlinear regression problems. There are several distinct sources of ill-conditioning in the case of neural networks, the following being perhaps the main ones.

Example

Consider the simple problem of constructing a feed-forward network to approximate the function y=tan(x) (Fig. 1). A training set consisting of 18 patterns was constructed by computing y at 20 points uniformly distributed in the interval [0,π] and then discarding the two points on either side of the discontinuity at π/2 in order to keep the values of y within a reasonable range.

Fig. 2 shows the 1/4/1 network chosen for the approximation. Let wijk denote the link from node i in layer (k-1) to

Implications of ill-conditioning for the choice of training algorithm

The effect of ill-conditioning on the minimisation process is well understood. If first-order methods such as steepest-descent are used, the iterations will tend to favour directions corresponding to large eigenvalues of the hessian matrix. This results in extremely slow convergence along the flat valleys associated with ill-conditioning. Back-propagation is of course equivalent to steepest descent when used in batch mode, i.e. when all training sets are taken into account at each step.

Implications of ill-conditioning for global optimisation

The methods discussed above are intended to locate local minima, that is, points at each of which the error function has the lowest value in its immediate vicinity. Second-order methods, general or specialised, are in principle no more likely to find a global minimum than are first-order methods.

The problem of finding a global minimum even over a set of distinct, well-defined local minima is a difficult one because of the fundamental impossibility of recognising such a global minimum using

Conclusions

The training problem for neural networks is usually an ill-conditioned one. This property is inherent in the form of the networks and their excitation functions, but can be exacerbated by the use of insufficient numbers of training sets, or training sets deficient in independent information, or over-complex networks. The presence of ill-conditioning makes first-order minimisation methods unlikely to be efficient, but also affects second-order methods. It can make second order methods such as

Acknowledgements

The work described in this paper was carried out with the help of funding from the European Community through the Human Capital and Mobility Project 'Models, algorithms and systems for decision making', contract no. CHRX-CT93-0087.

References (13)

  • Dixon, L. C. W. and Szego, G. P. (1975). Towards global optimisation. Amsterdam/New York: North-Holland//American...
  • Dixon, L. C. W. and Szego, G. P. (1978). Towards global optimisation 2. Amsterdam/New York: North-Holland//American...
  • Dixon, L. C. W. and Mills, D. J. (1991). Neural networks and nonlinear optimization I: the representation of continuous...
  • Ellacott, S. W. (1993). The numerical analysis approach. In J. Taylor (Ed.), Mathematical approaches to neural...
  • Gorse, D., Shepherd, A. and Taylor, J. G. (1994). A classical algorithm for avoiding local minima. 1994 International...
  • McKeown, J. J. (1975). Specialised versus general-purpose algorithms for minimizing functions that are sums of squared...
There are more references available in the full text version of this article.

Cited by (21)

  • A power-flow emulator approach for resilience assessment of repairable power grids subject to weather-induced failures and data deficiency

    2018, Applied Energy
    Citation Excerpt :

    A surrogate model, also known as emulator or meta-model, is a numerically cheap mathematical approximation of a computationally expensive realistic model [16]. Some examples of popular meta-models are Artificial Neural Networks [17,18], Poly-Harmonic Splines [19] and Kriging models [20]. Surrogates have been extensively applied to reduce time expenses of numerically burdensome models and few works have attempted to use meta-models to analyse power grids, see for instance [21–26].

  • Sound quality recognition using optimal wavelet-packet transform and artificial neural network methods

    2016, Mechanical Systems and Signal Processing
    Citation Excerpt :

    In addition, to find the optimal weights in a short time, some improved algorithm for training the BP neural network was proposed, such as the gradient descent, the quasi Newton and the Levenberg–Marquardt (LM) methods. Due to the rapid convergence for solving nonlinear least squares problems [46], the LM algorithm is adopted and used in the ANN training in this paper. In summary, the finally determined parameters for OWPT–ANN modeling are listed in Table 7.

  • Buckling analysis of a beam-column using multilayer perceptron neural network technique

    2013, Journal of the Franklin Institute
    Citation Excerpt :

    Because of its versatility, the artificial neural network (ANN) has been extensively applied to problems in various fields, by utilizing the capability of ANN′s function approximation. Solving a differential equation using this technique requires the training of the ANN that calculates the solution values at any point in the solution space including those not considered during the ANN training and also offers the following advantage over standard numerical methods [16–28]: Solution obtained via. ANN is differentiable, closed analytic form and easily used in any subsequent calculation and provides a solution with very good generalization properties.

  • Solving initial-boundary value problems for systems of partial differential equations using neural networks and optimization techniques

    2009, Journal of the Franklin Institute
    Citation Excerpt :

    In some cases, the learning process of multilayer perceptrons is time consuming and ill-conditioned. To avoid these difficulties, increasing of the input information and proper use of the suitable error minimization techniques are useful [39]. For simplicity, here we consider N instead of Ni.

  • Numerical solution for high order differential equations using a hybrid neural network-Optimization method

    2006, Applied Mathematics and Computation
    Citation Excerpt :

    The multi-layered feed forward neural networks are trainable. In some cases the feed forward training problem is ill-conditioned [20]. In proposed method here we do not suffer from such difficulties.

View all citing articles on Scopus
View full text