Improved RAN sequential prediction using orthogonal techniques
Introduction
In this paper we consider a solution to prediction problems using a learning scheme for artificial neural networks known as resource allocating network (RAN) [11]. From the architectural point of view, RANs are composed of a number of nodes based on radial basis function (RBF) kernels that might be considered as interpolants of an intrinsic multidimensional function defined over a given input space. The usual RBF activation function (i.e., the Gaussian function) acts as a locally tuned feature detector, as pointed out by Moody and Darken [10]. This fits well into the parallel and distributed computation paradigm that is characteristic of all neural network schemes.
As is well known, a general RBF network can be explained in a mathematical form by a linear expansion of M nodes (hidden neuronal units) characterized by their centers and widths {σi}. Their linear combination using weights {θi} and a bias θ0 is given in the following equation (throughout this article, the notation ||·|| stands for the Euclidean, or L2, norm):where we use the notation for the network output when an input vector is presented. The output of each individual neuron within this network is denoted by .
Among all learning algorithms that have been devised for training such RBF neural networks, the innovative idea in the RAN scheme proposed by Platt [11] was to introduce a constructive procedure with the aim of rendering networks closely reflecting the underlying complexity of the training data. By using a two-fold condition based on error level as well as on distance with respect to previously allocated centers, networks using Platt's rule overcome the drawbacks of fixed-size networks, and they are able to approximate and generalize relatively quickly. This makes them suitable for sequential learning, which is useful for on-line prediction and control.
Many enhancements to Platt's original algorithm have been presented in the literature. Least mean squares (LMS) [15] adjustment of parameters between addition of successive nodes was replaced by an approach based on the extended Kalman filter (EKF) in Kadirkamanathan and Niranjan's work [3].
The inclusion of a pruning strategy that removes nodes whose contribution to the output is decreasing was proposed later by Yingwei et al. [16], [17] in their M-RAN model. Basically, the normalized output is computed for each RBF, , where and M being the actual number of nodes. If for some ith unit this remains below a predefined threshold δ for a given number of iterations, the node is considered to have little contribution to the overall output and is thus removed from the network. This improvement preserves parsimony and also yields RBF networks with fewer hidden neurons and the same or better accuracy than [11] for several benchmarks.
A recent extension was that of Rosipal et al. [12], also using node pruning techniques as well as recursive least squares [18] with QR decomposition for adaptation of the linear output coefficients. The error rates for this new algorithm are clearly better than those of the original RAN procedure.
The rest of this paper is structured as follows: in Section 2, we formulate the problem of series prediction and discuss the basic ideas of the singular value decomposition (SVD) and QR techniques for compaction of information. The functionality of the proposed algorithm is described in Section 3 and experimental results obtained with it are discussed in Section 4. Finally, we make some concluding remarks and list the cited references.
Section snippets
Problem formulation
Throughout this article we will focus on a solution to time-series prediction problems. These may be formulated as follows: at a given time N a “window” of τw past values of a dynamical processis considered and the task involved is to predict a future value x(N+τh) based on this past information. Depending on the magnitude of the horizon τh, this task is referred to as “short-term” or “long-term” forecasting. Problems of this type arise in many areas of biological and
The algorithm
In this section a modification to Platt's RAN learning algorithm is introduced that makes use of SVD and QR-cp to adjust the lag structure of the network inputs and also to perform pruning of the nearly irrelevant RBFs that could, on occasion, slip into the network. The reader should note that, though explained in the particular context of the RAN algorithm, the discussion here could be adapted to extend any available RBF network procedure. We will first study how the algorithm determines the
Experiments and comparison
This section discusses the experiments performed using the above-described algorithm and compares its performance and characteristics to those of the original RAN algorithm. Throughout this discussion, we will refer to the solution of the long-term forecasting problem for the Mackey–Glass equation (2). An 8000 consecutive data point set was obtained by simulation of this equation with a=0.2, b=0.1, T=17 and x(t−T)=0.3 for t<T. A Runge–Kutta third-order method with step size 0.1 was used with
Conclusions
A detailed study has been made of the possibility of using the SVD and QR-cp techniques for compaction of information in two ways: the determination of an optimum set of prediction lags (which is equivalent to the appropriate input structure of the RBF network) and the pruning of irrelevant RBFs within the same network. We have also shown the effect of a new strategy for initial allocation of resources, which enables us to allocate most of the RBFs in the first stages in the form of small
Acknowledgements
This research was partially supported by the CICYT, Spain, under projects TIC97-1149 and TIC98-0982. We wish to thank Rafael José Yáñez Garcı́a of the Applied Mathematics Department at the University of Granada for his kindly provision of some of the bibliographic sources used in this paper. We are also very grateful to the anonymous reviewers for their constructive comments and helpful remarks.
Moisés Salmerón received his B.Sc. degree in 1994 and his M.Sc. degree in 1997, both in Computer Science from the University of Granada, Spain. His last-year engineering project was about the design of testable neural networks. In September, 1997 he joined the CASIP (Circuits and Systems for Information Processing) research group at the Department of Architecture and Computer Technology, where he is a researcher working towards his Ph.D. dissertation which deals with neural networks for time
References (18)
- et al.
The Singular Value Decomposition — applied in the modeling and prediction of quasiperiodic processes
Signal Processing
(1994) - et al.
Time Series Analysis: Forecasting and Control
(1994) - et al.
Matrix Computations
(1989) - et al.
A function estimation approach to sequential learning with neural networks
Neural Comput.
(1993) - et al.
On the application of orthogonal transformation for the design and analysis of feedforward networks
IEEE Trans. Neural Networks
(1995) - et al.
The Singular Value Decomposition: its computation and some applications
IEEE Trans. Automat. Control
(1980) - A. Levin, T.K. Leen, J.E. Moody, Fast pruning using principal components, in: Advances in Neural Information Processing...
- et al.
Oscillation and chaos in physiological control systems
Science
(1977) - et al.
Using the Karhunen–Loève transformation in the backpropagation training algorithm
IEEE Trans. Neural Networks
(1991)
Cited by (53)
A growing and pruning sequential learning algorithm of hyper basis function neural network for function approximation
2013, Neural NetworksCitation Excerpt :However, the size of the sliding window heavily influences the speed of learning process and its size is determined by user’s experience during experimental process. Another improvement of RAN concept is based on QR decomposition (Salmeron, Ortega, Puntonet, & Prieto, 2001) which is used to determine the optimal structure of network. The disadvantage of this approach is a large number of new parameters, various thresholds and necessity for all data in pruning process.
Fixed budget quantized kernel least-mean-square algorithm
2013, Signal ProcessingCitation Excerpt :Sensitivity analysis based pruning methods apply a series of sensitivity measures to qualify the system's perturbation after pruning. With orthogonal decomposition techniques, the data column that is less relevant can be removed from the radial basis function (RBF) network and a subset of regressor matrices that contain the essential time series information is preserved [22,23]. In addition, an important approach is to throw away the units resulting in a minimal influence on the system accuracy.
Data driven modeling based on dynamic parsimonious fuzzy neural network
2013, NeurocomputingCitation Excerpt :Vice versa, all approaches listed in this paragraph do not have any rule pruning mechanisms integrated, which is beneficial in order to endow a compact and parsimonious rule base while retaining their predictive accuracy. In line with a rule base simplification purpose, several sequential pruning mechanisms have been proposed by researchers, for instances [3,6,7,31] which benefit some concepts of singular value decomposition (SVD) and of statistical methods. The approach in [6,7] discriminates with some approaches eliminating fuzzy rules in offline or post-processing manner like ERR method [12] in DFNN and GDFNN [13,14], as it quantifies the importance of a fuzzy rule with the use of the newest training datum.
The automatic model selection and variable kernel width for RBF neural networks
2011, NeurocomputingCitation Excerpt :A significant improvement to RANEKF was made by Yingwei et al. [6] by introducing a pruning strategy based on the relative contribution of each hidden neuron to the overall network output. Other methods for pruning in RBF networks have been proposed in Salmeron et al. [7,8], Rojas et al. [9] and Huang et al. [10]. Instead, they all have various thresholds that have to be selected using exhaustive trial-and-error studies.
A novel efficient learning algorithm for self-generating fuzzy neural network with applications
2012, International Journal of Neural Systems
Moisés Salmerón received his B.Sc. degree in 1994 and his M.Sc. degree in 1997, both in Computer Science from the University of Granada, Spain. His last-year engineering project was about the design of testable neural networks. In September, 1997 he joined the CASIP (Circuits and Systems for Information Processing) research group at the Department of Architecture and Computer Technology, where he is a researcher working towards his Ph.D. dissertation which deals with neural networks for time series prediction. His current research interests are in the fields of neural networks, modern control theory, time series statistical modelling, and application of matricial techniques such as SVD, PCA and orthogonal transformations for the design of neural networks and hybrid systems for time series prediction.
Julio Ortega received a B.Sc. degree in electronic physics in 1985, an M.Sc. degree in electronics in 1986, and a Ph.D. degree in 1990, all from the University of Granada, Spain. His Ph.D. dissertation has won the Award of Ph.D. Dissertations of the University of Granada. He was at the Open University (UK), and at the Department of Electronics, University of Dortmund (Germany) as an invited researcher. Currently, he is an associate professor at the Department of Computer Architecture and Technology at the University of Granada. His research interests lie in the fields of artificial neural networks, fuzzy logic, evolutionary computation, parallel processing and parallel architectures. He is a member of the Computer Society of IEEE.
Carlos G. Puntonet received a B.Sc. degree in 1982, an M.Sc. degree in 1986, and his Ph.D. degree in 1994, all from the University of Granada, Spain. These degrees are in electronic physics. Currently, he is an Associate Professor at the Department of Computer Architecture and Technology at the University of Granada. His research interests lie in the fields of signal processing, independent component analysis and separation of sources using artificial neural networks.
Alberto Prieto received a B.Sc. degree in electronic physics in 1968 from the Complutense University (Madrid) and his Ph.D. degree from the University of Granada, Spain, in 1976, obtaining the Award of Ph.D. Dissertations and the Citema Foundation National Award. From 1969 to 1970 he was at the “Centro de Investigaciones Técnicas de Guipuzcoa” and at the “E.T.S.I Industriales” of San Sebastián. From 1971 to 1984 he was Director of the Computer Centre and from 1985 to 1990 Dean of the Computer Science and Technology studies of the Univ. of Granada. He is currently Full Professor and Director of the Department of Computer Architecture and Technology at the University of Granada. He has been visiting researcher in different foreign centres including the University of Rennes (1975, Prof. Boulaye), the Open University (UK) (1987, Prof. S.L. Hurst), the Institute National Polythechnique of Grenoble (1991, Profs. J. Herault and C. Jutten), and the University College of London (1991–92, Prof. P. Treleaven). His research interests are in the area of intelligent systems. Prof. Prieto was the Organization Chairman of the “International Workshop on Artificial Neural Networks (IWANN’91)” (Granada, September 17–19, 1991) and of the “7th International Conference on Microelectronics for Neural, Fuzzy and Bio-inspired Systems (MicroNeuro ’99) (Granada, Spain, April 7–9, 1999). He was nominated member of the IFIP WG (Neural Computer Systems) and Chairman of the Spanish RIG of the IEEE Neural Networks Council.
- 1
Partially supported by CICYT, Spain, under Project TIC97-1149.
- 2
Partially supported by CICYT, Spain, under Project TIC98-0982.