Learning methods for radial basis function networks

https://doi.org/10.1016/j.future.2004.03.013Get rights and content

Abstract

RBF networks represent a vital alternative to the widely used multilayer perceptron neural networks. In this paper we present and examine several learning methods for RBF networks and their combinations. A gradient-based learning, the three-step algorithm with unsupervised part, and an evolutionary algorithms are introduced, and their performance compared on benchmark problems from the Proben1 database. The results show that the three-step learning is usually the fastest, while the gradient learning achieves better precision. The best results can be achieved by employing hybrid approaches that combine presented methods.

Introduction

Artificial neural networks represent an alternative, ‘sub-symbolic’ approach to problem solving. They are typically able to solve complex non-linear problems possessing multiple local extremes, often based on noisy or incomplete data. Theoretical results concerning the approximation capabilities and complexity of neural networks are quite elaborate, but they still cannot answer all our relevant questions. The niche of learning algorithms is quite wide ranging from simple vector quantization to genetic learning. So far, it is difficult, if not impossible, to encompass such different approaches by a unifying theory. Yet, it is desirable to gain insight about applicability and behavior of various neural architectures and algorithms with respect to different classes of problems. This makes heuristics and experimental evaluations very important.

In our work we deal with the so-called radial basis function (RBF) networks that represent an interesting alternative to the widely used multilayer perceptron (MLP) networks [2]. These neural networks are relatively new, they have been proved to possess similar approximation power, and they have a richer spectrum of learning possibilities. In our previous work [3], [7], [9], we have developed and examined some of the RBF learning algorithms with the emphasis on hybrid techniques. The purpose of this paper is to describe and evaluate the most promising RBF networks learning approaches. Three tasks from the well-known Proben1 [10] benchmark database have been selected for experiments. The results of all tests are compared to previous work in this area, namely the MLP networks results.

Section snippets

Radial basis function networks

By an RBF unit we mean a neuron with multiple real inputs x=(x1,,xn) and one output y. Each unit is determined by an n-dimensional vector c which is called center. It can have an additional parameter b usually called width. The output y of an RBF unit is computed as: y=ϕ(ξ); ξ=xc/b, where ϕ:RR is a suitable activation function, typically Gaussian ϕ(z)=ez2. For evaluating xc/d, the Euclidean norm is usually used. In this paper we consider a general weighted norm instead of the

Gradient learning

Maybe the most straightforward method of an RBF network learning is based upon the well-known back propagation algorithm for multilayer perceptron. The back propagation learning is a non-linear gradient descent algorithm that modifies all network parameters proportionally to the partial derivative of the training error. The trick is in a clever ordering of the parameters such that all partial derivatives can be consequently computed. Since the RBF network has formally the same structure as MLP,

Three-step learning

The gradient learning described in previous section unifies all parameters by treating them in the same way. Now we introduce a learning method taking advantage of the well-defined meaning of RBF network parameters (cf. [3], [6]). In this approach the learning process is divided into three consequent steps according to the three distinct sets of network parameters. The first step consists of determining the hidden unit centers. Positions of the centers should reflect the density of data points,

Evolutionary learning

The third learning method introduced in this paper is based on a genetic algorithm (GA). GA is a general stochastic optimization method inspired by natural evolution. A genetic algorithm works with a population of individuals. An individual (see Fig. 1) represents a vector of encoded values for all parameters of one RBF network. Each individual in population is associated with the value of the error function of a corresponding network which reflects its quality as a solution to the given

Methodology and data

In order to achieve a high degree of comparability of our results, we have selected a frequently used tasks for evaluation of learning algorithms. In particular, we have used a well-known two-spirals problem, together with three tasks from the Proben1 database: the cancer and glass classification tasks, and the heart approximation problem. Moreover, each of the Proben1 data sets is available in three different ordering defining different data partitions for training and testing. These are

The two spirals task

The two spirals problem has been used to demonstrate the advantages of using RBF units with weighted norms. The geometry of the problem is suitable for the adaptation of the shape of areas controlled by individual units. Table 2 compares three types of networks: a 100 units network using Euclidean norm units, and two networks with the weighted norm, with 50 and 70 units. The learning speed results show that the 70 units weighted norm network outperforms the 100 units network with Euclidean norm

The cancer task

Number of different learning algorithms has been tested on the cancer data. Table 3, Table 4 show results of the gradient algorithm run for network with 5 units. Both normalized error and classification error on training and testing sets are shown, together with mean and standard deviation values. Time of 100 iterations was approximately 1.5 s. Table 4 and Fig. 4 show the results of a three-step learning algorithm. This time, the networks with 5, 10, 20 and 50 have been used. Times for 100

The glass task

For the glass data set, the gradient learning, the three-step learning, and their combination have been tested. The gradient algorithm results are presented in Table 6 and Fig. 6. Time of 100 iterations of a gradient learning is 4.3 s (for 10 units), and 9.1 s (for 15 units). Results for three-step learning are gathered in Table 7. Times for 100 iterations are 24 s (15 units), 94 s (30 units), and 250 s (50 units). Both algorithms behave well, larger number of units means smaller errors in all

The heart task

The heart data represent an approximation problem with quite a large input space. (In the Proben1 documentation this approximation version of the data set is referred to as a “hearta” task.) The gradient learning and the three-step learning have been tested for networks with 30, 40, and 50 units. Results are presented in Table 8. The gradient learning shows relatively same error regardless of the number of units, while during the three-step learning the error decreases with the number of units.

Conclusions

Let us generalize the observations from the tests performed and conclude with the following results. Comparison of the performance of most of our algorithms is summarized in Table 9 (Fig. 7). Gradient-based algorithm is usually able to achieve better results in terms of error measured on both the training and testing set. The three-step learning is usually the fastest method, due to the unsupervised phase to set the centers, and the quite simple algorithm to set the output weights. The errors

Acknowledgements

This work has been partially supported by Grant Agency of the Czech Republic under grants 201/03/P163 and 201/02/0428.

Roman Neruda received his PhD in 1998 at the Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague. He works as a researcher at the ICS Prague and teaches at Charles University in Prague. His current research interests include hybrid computational methods, especially combinations of neural networks and evolutionary algorithms, and adaptive agents.

References (10)

  • R.M. Gray

    Vector quantization

    IEEE ASSP Mag.

    (1984)
  • S. Haykin

    Neural Networks

    (1994)
  • K. Hlaváčková et al.

    Radial basis function networks

    Neural Network World

    (1993)
  • Institute of Computer Science, ASCR, Prague, On-line documentation of the Bang3 system,...
  • P. Kudová, Configurations of RBF experiments,...
There are more references available in the full text version of this article.

Cited by (0)

Roman Neruda received his PhD in 1998 at the Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague. He works as a researcher at the ICS Prague and teaches at Charles University in Prague. His current research interests include hybrid computational methods, especially combinations of neural networks and evolutionary algorithms, and adaptive agents.

Petra Kudová received her MSc in 2001. She is currently a PhD student and a research assistant at the Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague. Her research interests include learning and approximation capabilities of neural networks, RBF networks, regularization, and multi-agent systems.

View full text