Elsevier

Neural Networks

Volume 17, Issue 4, May 2004, Pages 589-609
Neural Networks

New training strategies for constructive neural networks with application to regression problems

https://doi.org/10.1016/j.neunet.2004.02.002Get rights and content

Abstract

Regression problem is an important application area for neural networks (NNs). Among a large number of existing NN architectures, the feedforward NN (FNN) paradigm is one of the most widely used structures. Although one-hidden-layer feedforward neural networks (OHL-FNNs) have simple structures, they possess interesting representational and learning capabilities. In this paper, we are interested particularly in incremental constructive training of OHL-FNNs. In the proposed incremental constructive training schemes for an OHL-FNN, input-side training and output-side training may be separated in order to reduce the training time. A new technique is proposed to scale the error signal during the constructive learning process to improve the input-side training efficiency and to obtain better generalization performance. Two pruning methods for removing the input-side redundant connections have also been applied. Numerical simulations demonstrate the potential and advantages of the proposed strategies when compared to other existing techniques in the literature.

Introduction

Among the existing numerous neural networks (NNs) paradigms, such as Hopfield networks, Kohonen's self-organizing feature maps (SOM), etc. the feedforward NNs (FNNs) are the most popular due to their flexibility in structure, good representational capabilities, and large number of available training algorithms (Bose and Liang, 1996, Leondes, 1998, Lippmann, 1987, Sarkar, 1995). In this paper we will be mainly concerned with FNNs.

When using a NN, one needs to address three important issues. The solutions to these will significantly influence the overall performance of the NN as far as the following two considerations are concerned: (i) recognition rate to new patterns, and (ii) generalization performance to new data sets that have not been presented during network training.

The first problem is the selection of data/patterns for network training. This is a problem that has practical implications and has not received as much attention by researchers as the other problems. The training data set selection can have considerable effects on the performance of the trained network. Some research on this issue has been conducted in Tetko (1997) (and the references therein).

The second problem is the selection of an appropriate and efficient training algorithm from a large number of possible training algorithms that have been developed in the literature, such as the classical error backpropagation (BP) (Rumelhart, Hinton, & Williams, 1986) and its many variants (Sarkar, 1995, Magoulas et al., 1997, Stager and Agarwal, 1997) and the second-order algorithms (Shepherd, 1997, Osowski et al., 1996), to name a few. Many new training algorithms with faster convergence properties and less computational requirements are being developed by researchers in the NN community.

The third problem is the determination of the network size. This problem is more important from a practical point of view when compared to the above two problems, and is generally more difficult to solve. The problem here is to find a network structure as small as possible to meet certain desired performance specifications. What is usually done in practice is that the developer trains a number of networks with different sizes, and then the smallest network that can fulfill all or most of the required performance requirements is selected. This amounts to a tedious process of trial and errors that seems to be unfortunately unavoidable. This paper focuses on developing a systematic procedure for an automatic determination and/or adaptation of the network architecture for a FNN.

The second and third problems are actually closely related to one another in the sense that different training algorithms are suitable for different NN topologies. Therefore, the above three considerations are indeed critical when a NN is to be applied to a real-life problem. Consider a data set generated by an underlying function. This situation usually occurs in pattern classification, function approximation, and regression problems. The problem is to find a model that can represent the input–output relationship of the data set. The model is to be determined or trained based on the data set so that it can predict within some prespecified error bounds the output to any new input pattern. In general, a FNN can solve this problem if its structure is chosen appropriately. Too small a network may not be able to learn the inherent complexities present in the data set, but too large a network may learn ‘unimportant’ details such as observation noise in the training samples, leading to ‘overfitting’ and hence poor generalization performance. This is analogous to the situation when one uses polynomial functions for curve fitting problems. Generally acceptable results cannot be achieved if too few coefficients are used, since the characteristics or features of the underlying function cannot be captured completely. However, too many coefficients may not only fit the underlying function but also the noise contained in the data, yielding a poor representation of the underlying function. When an ‘optimal’ number of coefficients are used, the fitted polynomial will then yield the ‘best’ representation of the function and also the best prediction for any new data.

A similar situation arises in the application of NNs, where it is also imperative to relate the architecture of the NN to the complexity of the problem. Obviously, algorithms that can determine an appropriate network architecture automatically according to the complexity of the underlying function embedded in the data set are very cost-efficient, and thus highly desirable. Efforts toward the network size determination have been made in the literature for many years, and many techniques have been developed (Hush and Horne, 1993, Kwok and Yeung, 1997a) (and the references therein). Towards this end, in Section 2, we review three general methods that deal with the problem of automatic NN structure determination.

Section snippets

Pruning algorithms

One intuitive way to determine the network size is to first establish by some means a network that is considered to be sufficiently large for the problem being considered, and then trim the unnecessary connections or units of the network to reduce it to an appropriate size. This is the basis for the pruning algorithms. Since it is much ‘easier’ to determine or select a ‘very large’ network than to find the proper size needed, the pruning idea is expected to provide a practical but a partial

Constructive algorithms for feedforward neural networks

In this section, we first give a simple formulation of the training problem for a constructive OHL-FNN in the context of a nonlinear optimization problem. The advantages and disadvantages of these constructive algorithms are also discussed.

Statement of the problem

Generally, a multivariate model-free regression problem can be described as follows. Suppose one is given P pairs of vectors(dj,xj)=(d1j,d2j,…,dNj;x1j,x2j,…,xMj),j=1,2,…,P,that are generated from unknown modelsdij=gi(xj)+ϵij,i=1,2,…,N,j=1,2,……,P,where the {dj}'s are called the multivariate ‘response’ vectors and {xj}'s are called the ‘independent variables’ or the ‘carriers’, and M and N are dimensions of x and d, respectively. The gi's are unknown smooth nonparametric or model-free functions

Error scaling strategy for input-side training

In this section, the features of a correlation-based objective function is investigated. Without loss of any generality, a regression problem with only one output is considered. The correlation-based objective function in this case is given as follows (Fahlman & Lebiere, 1991):Jinput=j=1P(en−1jēn−1)(fn(snj)−f̄n),where ēn−1=1/Pj=1Pen−1j, f̄n=1/Pj=1Pfn(snj), with ēn−1 and f̄n denoting the mean values of the training error and the output of the n-th hidden unit over the entire training

Input-side pruning strategies

In the input-side training, one can have one or a pool of candidates to train a new hidden unit. In the latter case, the neuron that results in the maximum objective function will be selected as the best candidate. This candidate is incorporated into the network and its input-side weights are frozen in the subsequent training process that follows. However, certain input-side weights may not contribute much to the maximization of the objective function or indirectly to the reduction of the

Convergence of the proposed constructive algorithm

For our proposed constructive OHL-FNN, the convergence of the algorithm with respect to the added hidden units is an important issue and needs careful investigation. First, we investigate an ideal case where assuming a ‘perfect’ input-side training the convergence of the constructive training algorithm with and without error scaling operations is determined. The ideal case can yield an ‘upperbound’ estimate on the convergence rate that the constructive training algorithm can theoretically

Conclusions

In this paper, a new constructive adaptive NN scheme is proposed to scale the error signal during the learning process to improve the input-side training effectiveness and efficiency, and to obtain better generalization performance capabilities. All the regression simulation results that are performed, up to 13th dimensional input space, confirmed the effectiveness and superiority of the proposed new technique. Further simulations for higher dimensional input spaces will have to be performed to

Acknowledgements

This research was supported in part by the NSERC (Natural Science and Engineering Research Council of Canada) Discovery Grant number RGPIN-42515.

References (36)

  • Blake C., Merz C., (1998). UCI Repository of machine learning databases....
  • N.K. Bose et al.

    Neural network fundamentals with graphs, algorithms, and applications

    (1996)
  • W.L. Buntine et al.

    Bayesian backpropagation

    Complex Systems

    (1991)
  • G. Castellano et al.

    An iterative pruning algorithm for feedforward neural networks

    IEEE Transactions on Neural Networks

    (1990)
  • Y. Chauvin

    A back-propagation algorithm with optimal use of hidden units

  • Y.L. Cun et al.

    Optimal brain damage

  • T. Draelos et al.

    A constructive neural network algorithm for function approximation

    (1996)
  • S.E. Fahlman et al.

    The cascade-correlation learning architecture

    (1991)
  • Cited by (0)

    View full text