Elsevier

Neurocomputing

Volume 41, Issues 1–4, October 2001, Pages 153-172
Neurocomputing

Improved RAN sequential prediction using orthogonal techniques

https://doi.org/10.1016/S0925-2312(00)00363-5Get rights and content

Abstract

A new learning strategy for time-series prediction using radial basis function (RBF) networks is introduced. Its potential is examined in the particular case of the resource allocating network model, although the same ideas could apply to extend any other procedure. In the early stages of learning, addition of successive new groups of RBFs provides an increased rate of convergence. At the same time, the optimum lag structure is determined using orthogonal techniques such as QR factorization and singular value decomposition (SVD). We claim that the same techniques can be applied to the pruning problem, and thus they are a useful tool for compaction of information. Our comparison with the original RAN algorithm shows a comparable error measure but much smaller-sized networks. The extra effort required by QR and SVD is balanced by the simplicity of only using least mean squares for the iterative parameter adaptation.

Introduction

In this paper we consider a solution to prediction problems using a learning scheme for artificial neural networks known as resource allocating network (RAN) [11]. From the architectural point of view, RANs are composed of a number of nodes based on radial basis function (RBF) kernels that might be considered as interpolants of an intrinsic multidimensional function defined over a given input space. The usual RBF activation function (i.e., the Gaussian function) acts as a locally tuned feature detector, as pointed out by Moody and Darken [10]. This fits well into the parallel and distributed computation paradigm that is characteristic of all neural network schemes.

As is well known, a general RBF network can be explained in a mathematical form by a linear expansion of M nodes (hidden neuronal units) characterized by their centers {ci} and widths {σi}. Their linear combination using weights {θi} and a bias θ0 is given in the following equation (throughout this article, the notation ||·|| stands for the Euclidean, or L2, norm):o(x)=θ0+i=1Moi(x)=θ0+i=1Mθiexp||xci||2σi2,where we use the notation o(x) for the network output when an input vector x is presented. The output of each individual neuron within this network is denoted by oi(x).

Among all learning algorithms that have been devised for training such RBF neural networks, the innovative idea in the RAN scheme proposed by Platt [11] was to introduce a constructive procedure with the aim of rendering networks closely reflecting the underlying complexity of the training data. By using a two-fold condition based on error level as well as on distance with respect to previously allocated centers, networks using Platt's rule overcome the drawbacks of fixed-size networks, and they are able to approximate and generalize relatively quickly. This makes them suitable for sequential learning, which is useful for on-line prediction and control.

Many enhancements to Platt's original algorithm have been presented in the literature. Least mean squares (LMS) [15] adjustment of parameters between addition of successive nodes was replaced by an approach based on the extended Kalman filter (EKF) in Kadirkamanathan and Niranjan's work [3].

The inclusion of a pruning strategy that removes nodes whose contribution to the output is decreasing was proposed later by Yingwei et al. [16], [17] in their M-RAN model. Basically, the normalized output |oi(x)|/omax(x) is computed for each RBF, i=1,2,…,M, where omax(x)=max1≤j≤M{|oj(x)|} and M being the actual number of nodes. If for some ith unit this remains below a predefined threshold δ for a given number of iterations, the node is considered to have little contribution to the overall output and is thus removed from the network. This improvement preserves parsimony and also yields RBF networks with fewer hidden neurons and the same or better accuracy than [11] for several benchmarks.

A recent extension was that of Rosipal et al. [12], also using node pruning techniques as well as recursive least squares [18] with QR decomposition for adaptation of the linear output coefficients. The error rates for this new algorithm are clearly better than those of the original RAN procedure.

The rest of this paper is structured as follows: in Section 2, we formulate the problem of series prediction and discuss the basic ideas of the singular value decomposition (SVD) and QR techniques for compaction of information. The functionality of the proposed algorithm is described in Section 3 and experimental results obtained with it are discussed in Section 4. Finally, we make some concluding remarks and list the cited references.

Section snippets

Problem formulation

Throughout this article we will focus on a solution to time-series prediction problems. These may be formulated as follows: at a given time N a “window” of τw past values of a dynamical processx(N−τw+1),…,x(N−1),x(N)is considered and the task involved is to predict a future value x(N+τh) based on this past information. Depending on the magnitude of the horizon τh, this task is referred to as “short-term” or “long-term” forecasting. Problems of this type arise in many areas of biological and

The algorithm

In this section a modification to Platt's RAN learning algorithm is introduced that makes use of SVD and QR-cp to adjust the lag structure of the network inputs and also to perform pruning of the nearly irrelevant RBFs that could, on occasion, slip into the network. The reader should note that, though explained in the particular context of the RAN algorithm, the discussion here could be adapted to extend any available RBF network procedure. We will first study how the algorithm determines the

Experiments and comparison

This section discusses the experiments performed using the above-described algorithm and compares its performance and characteristics to those of the original RAN algorithm. Throughout this discussion, we will refer to the solution of the long-term forecasting problem for the Mackey–Glass equation (2). An 8000 consecutive data point set was obtained by simulation of this equation with a=0.2, b=0.1, T=17 and x(tT)=0.3 for t<T. A Runge–Kutta third-order method with step size 0.1 was used with

Conclusions

A detailed study has been made of the possibility of using the SVD and QR-cp techniques for compaction of information in two ways: the determination of an optimum set of prediction lags (which is equivalent to the appropriate input structure of the RBF network) and the pruning of irrelevant RBFs within the same network. We have also shown the effect of a new strategy for initial allocation of resources, which enables us to allocate most of the RBFs in the first stages in the form of small

Acknowledgements

This research was partially supported by the CICYT, Spain, under projects TIC97-1149 and TIC98-0982. We wish to thank Rafael José Yáñez Garcı́a of the Applied Mathematics Department at the University of Granada for his kindly provision of some of the bibliographic sources used in this paper. We are also very grateful to the anonymous reviewers for their constructive comments and helpful remarks.

Moisés Salmerón received his B.Sc. degree in 1994 and his M.Sc. degree in 1997, both in Computer Science from the University of Granada, Spain. His last-year engineering project was about the design of testable neural networks. In September, 1997 he joined the CASIP (Circuits and Systems for Information Processing) research group at the Department of Architecture and Computer Technology, where he is a researcher working towards his Ph.D. dissertation which deals with neural networks for time

References (18)

  • P.P. Kanjilal et al.

    The Singular Value Decomposition — applied in the modeling and prediction of quasiperiodic processes

    Signal Processing

    (1994)
  • G.E.P. Box et al.

    Time Series Analysis: Forecasting and Control

    (1994)
  • G.H. Golub et al.

    Matrix Computations

    (1989)
  • V. Kadirkamanathan et al.

    A function estimation approach to sequential learning with neural networks

    Neural Comput.

    (1993)
  • P.P. Kanjilal et al.

    On the application of orthogonal transformation for the design and analysis of feedforward networks

    IEEE Trans. Neural Networks

    (1995)
  • V.C. Klema et al.

    The Singular Value Decomposition: its computation and some applications

    IEEE Trans. Automat. Control

    (1980)
  • A. Levin, T.K. Leen, J.E. Moody, Fast pruning using principal components, in: Advances in Neural Information Processing...
  • M.C. Mackey et al.

    Oscillation and chaos in physiological control systems

    Science

    (1977)
  • H.A. Malki et al.

    Using the Karhunen–Loève transformation in the backpropagation training algorithm

    IEEE Trans. Neural Networks

    (1991)
There are more references available in the full text version of this article.

Cited by (53)

  • A growing and pruning sequential learning algorithm of hyper basis function neural network for function approximation

    2013, Neural Networks
    Citation Excerpt :

    However, the size of the sliding window heavily influences the speed of learning process and its size is determined by user’s experience during experimental process. Another improvement of RAN concept is based on QR decomposition (Salmeron, Ortega, Puntonet, & Prieto, 2001) which is used to determine the optimal structure of network. The disadvantage of this approach is a large number of new parameters, various thresholds and necessity for all data in pruning process.

  • Fixed budget quantized kernel least-mean-square algorithm

    2013, Signal Processing
    Citation Excerpt :

    Sensitivity analysis based pruning methods apply a series of sensitivity measures to qualify the system's perturbation after pruning. With orthogonal decomposition techniques, the data column that is less relevant can be removed from the radial basis function (RBF) network and a subset of regressor matrices that contain the essential time series information is preserved [22,23]. In addition, an important approach is to throw away the units resulting in a minimal influence on the system accuracy.

  • Data driven modeling based on dynamic parsimonious fuzzy neural network

    2013, Neurocomputing
    Citation Excerpt :

    Vice versa, all approaches listed in this paragraph do not have any rule pruning mechanisms integrated, which is beneficial in order to endow a compact and parsimonious rule base while retaining their predictive accuracy. In line with a rule base simplification purpose, several sequential pruning mechanisms have been proposed by researchers, for instances [3,6,7,31] which benefit some concepts of singular value decomposition (SVD) and of statistical methods. The approach in [6,7] discriminates with some approaches eliminating fuzzy rules in offline or post-processing manner like ERR method [12] in DFNN and GDFNN [13,14], as it quantifies the importance of a fuzzy rule with the use of the newest training datum.

  • The automatic model selection and variable kernel width for RBF neural networks

    2011, Neurocomputing
    Citation Excerpt :

    A significant improvement to RANEKF was made by Yingwei et al. [6] by introducing a pruning strategy based on the relative contribution of each hidden neuron to the overall network output. Other methods for pruning in RBF networks have been proposed in Salmeron et al. [7,8], Rojas et al. [9] and Huang et al. [10]. Instead, they all have various thresholds that have to be selected using exhaustive trial-and-error studies.

View all citing articles on Scopus

Moisés Salmerón received his B.Sc. degree in 1994 and his M.Sc. degree in 1997, both in Computer Science from the University of Granada, Spain. His last-year engineering project was about the design of testable neural networks. In September, 1997 he joined the CASIP (Circuits and Systems for Information Processing) research group at the Department of Architecture and Computer Technology, where he is a researcher working towards his Ph.D. dissertation which deals with neural networks for time series prediction. His current research interests are in the fields of neural networks, modern control theory, time series statistical modelling, and application of matricial techniques such as SVD, PCA and orthogonal transformations for the design of neural networks and hybrid systems for time series prediction.

Julio Ortega received a B.Sc. degree in electronic physics in 1985, an M.Sc. degree in electronics in 1986, and a Ph.D. degree in 1990, all from the University of Granada, Spain. His Ph.D. dissertation has won the Award of Ph.D. Dissertations of the University of Granada. He was at the Open University (UK), and at the Department of Electronics, University of Dortmund (Germany) as an invited researcher. Currently, he is an associate professor at the Department of Computer Architecture and Technology at the University of Granada. His research interests lie in the fields of artificial neural networks, fuzzy logic, evolutionary computation, parallel processing and parallel architectures. He is a member of the Computer Society of IEEE.

Carlos G. Puntonet received a B.Sc. degree in 1982, an M.Sc. degree in 1986, and his Ph.D. degree in 1994, all from the University of Granada, Spain. These degrees are in electronic physics. Currently, he is an Associate Professor at the Department of Computer Architecture and Technology at the University of Granada. His research interests lie in the fields of signal processing, independent component analysis and separation of sources using artificial neural networks.

Alberto Prieto received a B.Sc. degree in electronic physics in 1968 from the Complutense University (Madrid) and his Ph.D. degree from the University of Granada, Spain, in 1976, obtaining the Award of Ph.D. Dissertations and the Citema Foundation National Award. From 1969 to 1970 he was at the “Centro de Investigaciones Técnicas de Guipuzcoa” and at the “E.T.S.I Industriales” of San Sebastián. From 1971 to 1984 he was Director of the Computer Centre and from 1985 to 1990 Dean of the Computer Science and Technology studies of the Univ. of Granada. He is currently Full Professor and Director of the Department of Computer Architecture and Technology at the University of Granada. He has been visiting researcher in different foreign centres including the University of Rennes (1975, Prof. Boulaye), the Open University (UK) (1987, Prof. S.L. Hurst), the Institute National Polythechnique of Grenoble (1991, Profs. J. Herault and C. Jutten), and the University College of London (1991–92, Prof. P. Treleaven). His research interests are in the area of intelligent systems. Prof. Prieto was the Organization Chairman of the “International Workshop on Artificial Neural Networks (IWANN’91)” (Granada, September 17–19, 1991) and of the “7th International Conference on Microelectronics for Neural, Fuzzy and Bio-inspired Systems (MicroNeuro ’99) (Granada, Spain, April 7–9, 1999). He was nominated member of the IFIP WG (Neural Computer Systems) and Chairman of the Spanish RIG of the IEEE Neural Networks Council.

1

Partially supported by CICYT, Spain, under Project TIC97-1149.

2

Partially supported by CICYT, Spain, under Project TIC98-0982.

View full text