Learning methods for radial basis function networks
Introduction
Artificial neural networks represent an alternative, ‘sub-symbolic’ approach to problem solving. They are typically able to solve complex non-linear problems possessing multiple local extremes, often based on noisy or incomplete data. Theoretical results concerning the approximation capabilities and complexity of neural networks are quite elaborate, but they still cannot answer all our relevant questions. The niche of learning algorithms is quite wide ranging from simple vector quantization to genetic learning. So far, it is difficult, if not impossible, to encompass such different approaches by a unifying theory. Yet, it is desirable to gain insight about applicability and behavior of various neural architectures and algorithms with respect to different classes of problems. This makes heuristics and experimental evaluations very important.
In our work we deal with the so-called radial basis function (RBF) networks that represent an interesting alternative to the widely used multilayer perceptron (MLP) networks [2]. These neural networks are relatively new, they have been proved to possess similar approximation power, and they have a richer spectrum of learning possibilities. In our previous work [3], [7], [9], we have developed and examined some of the RBF learning algorithms with the emphasis on hybrid techniques. The purpose of this paper is to describe and evaluate the most promising RBF networks learning approaches. Three tasks from the well-known Proben1 [10] benchmark database have been selected for experiments. The results of all tests are compared to previous work in this area, namely the MLP networks results.
Section snippets
Radial basis function networks
By an RBF unit we mean a neuron with multiple real inputs and one output y. Each unit is determined by an n-dimensional vector which is called center. It can have an additional parameter b usually called width. The output y of an RBF unit is computed as: ; where is a suitable activation function, typically Gaussian . For evaluating , the Euclidean norm is usually used. In this paper we consider a general weighted norm instead of the
Gradient learning
Maybe the most straightforward method of an RBF network learning is based upon the well-known back propagation algorithm for multilayer perceptron. The back propagation learning is a non-linear gradient descent algorithm that modifies all network parameters proportionally to the partial derivative of the training error. The trick is in a clever ordering of the parameters such that all partial derivatives can be consequently computed. Since the RBF network has formally the same structure as MLP,
Three-step learning
The gradient learning described in previous section unifies all parameters by treating them in the same way. Now we introduce a learning method taking advantage of the well-defined meaning of RBF network parameters (cf. [3], [6]). In this approach the learning process is divided into three consequent steps according to the three distinct sets of network parameters. The first step consists of determining the hidden unit centers. Positions of the centers should reflect the density of data points,
Evolutionary learning
The third learning method introduced in this paper is based on a genetic algorithm (GA). GA is a general stochastic optimization method inspired by natural evolution. A genetic algorithm works with a population of individuals. An individual (see Fig. 1) represents a vector of encoded values for all parameters of one RBF network. Each individual in population is associated with the value of the error function of a corresponding network which reflects its quality as a solution to the given
Methodology and data
In order to achieve a high degree of comparability of our results, we have selected a frequently used tasks for evaluation of learning algorithms. In particular, we have used a well-known two-spirals problem, together with three tasks from the Proben1 database: the cancer and glass classification tasks, and the heart approximation problem. Moreover, each of the Proben1 data sets is available in three different ordering defining different data partitions for training and testing. These are
The two spirals task
The two spirals problem has been used to demonstrate the advantages of using RBF units with weighted norms. The geometry of the problem is suitable for the adaptation of the shape of areas controlled by individual units. Table 2 compares three types of networks: a 100 units network using Euclidean norm units, and two networks with the weighted norm, with 50 and 70 units. The learning speed results show that the 70 units weighted norm network outperforms the 100 units network with Euclidean norm
The cancer task
Number of different learning algorithms has been tested on the cancer data. Table 3, Table 4 show results of the gradient algorithm run for network with 5 units. Both normalized error and classification error on training and testing sets are shown, together with mean and standard deviation values. Time of 100 iterations was approximately 1.5 s. Table 4 and Fig. 4 show the results of a three-step learning algorithm. This time, the networks with 5, 10, 20 and 50 have been used. Times for 100
The glass task
For the glass data set, the gradient learning, the three-step learning, and their combination have been tested. The gradient algorithm results are presented in Table 6 and Fig. 6. Time of 100 iterations of a gradient learning is 4.3 s (for 10 units), and 9.1 s (for 15 units). Results for three-step learning are gathered in Table 7. Times for 100 iterations are 24 s (15 units), 94 s (30 units), and 250 s (50 units). Both algorithms behave well, larger number of units means smaller errors in all
The heart task
The heart data represent an approximation problem with quite a large input space. (In the Proben1 documentation this approximation version of the data set is referred to as a “hearta” task.) The gradient learning and the three-step learning have been tested for networks with 30, 40, and 50 units. Results are presented in Table 8. The gradient learning shows relatively same error regardless of the number of units, while during the three-step learning the error decreases with the number of units.
Conclusions
Let us generalize the observations from the tests performed and conclude with the following results. Comparison of the performance of most of our algorithms is summarized in Table 9 (Fig. 7). Gradient-based algorithm is usually able to achieve better results in terms of error measured on both the training and testing set. The three-step learning is usually the fastest method, due to the unsupervised phase to set the centers, and the quite simple algorithm to set the output weights. The errors
Acknowledgements
This work has been partially supported by Grant Agency of the Czech Republic under grants 201/03/P163 and 201/02/0428.
Roman Neruda received his PhD in 1998 at the Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague. He works as a researcher at the ICS Prague and teaches at Charles University in Prague. His current research interests include hybrid computational methods, especially combinations of neural networks and evolutionary algorithms, and adaptive agents.
References (10)
Vector quantization
IEEE ASSP Mag.
(1984)Neural Networks
(1994)- et al.
Radial basis function networks
Neural Network World
(1993) - Institute of Computer Science, ASCR, Prague, On-line documentation of the Bang3 system,...
- P. Kudová, Configurations of RBF experiments,...
Cited by (0)
Roman Neruda received his PhD in 1998 at the Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague. He works as a researcher at the ICS Prague and teaches at Charles University in Prague. His current research interests include hybrid computational methods, especially combinations of neural networks and evolutionary algorithms, and adaptive agents.
Petra Kudová received her MSc in 2001. She is currently a PhD student and a research assistant at the Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague. Her research interests include learning and approximation capabilities of neural networks, RBF networks, regularization, and multi-agent systems.