A new EM-based training algorithm for RBF networks

doi:10.1016/S0893-6080(02)00215-0

Neural Networks

Volume 16, Issue 1, January 2003, Pages 69-77

https://doi.org/10.1016/S0893-6080(02)00215-0 Get rights and content

Abstract

In this paper, we propose a new Expectation–Maximization (EM) algorithm which speeds up the training of feedforward networks with local activation functions such as the Radial Basis Function (RBF) network. In previously proposed approaches, at each E-step the residual is decomposed equally among the units or proportionally to the weights of the output layer. However, these approaches tend to slow down the training of networks with local activation units. To overcome this drawback in this paper we use a new E-step which applies a soft decomposition of the residual among the units. In particular, the decoupling variables are estimated as the posterior probability of a component given an input–output pattern. This adaptive decomposition takes into account the local nature of the activation function and, by allowing the RBF units to focus on different subregions of the input space, the convergence is improved. The proposed EM training algorithm has been applied to the nonlinear modeling of a MESFET transistor.

Introduction

Radial Basis Function (RBF) networks have become one of the most popular feedforward neural networks with applications in regression, classification and function approximation problems (Bishop, 1997, Haykin, 1994). The RBF network approximates nonlinear mappings by weighted sums of Gaussian kernels. Therefore, an RBF learning algorithm must estimate the centers of the units, their variances and the weights of the output layer. Typically, the learning process is separated into two steps: first, a nonlinear optimization procedure to select the centers and the variances and, second, a linear optimization step to fix the output weights. To simplify the nonlinear optimization step, the variances are usually fixed in advance and the centers are selected at random (Broomhead & Lowe, 1988) or applying a clustering algorithm (Moody & Darken, 1989).

Other approaches try to solve the global nonlinear optimization problem using supervised (gradient-based) procedures to estimate the network parameters, which minimize the mean square error (MSE) between the desired output and the output of the network (Karayanis, 1997, Lowe, 1989, Santamarı́a et al., 1999). However, gradient descent techniques tend to be computationally complex and suffer from local minima.

As an alternative to global optimization procedures, a general and powerful method such as the Expectation–Maximization (EM) algorithm (Dempster, Laird, &, Rubin, 1977) can be applied to obtain maximum likelihood (ML) estimates of the network parameters. In the neural networks literature, the EM algorithm has been applied in a number of problems: supervised/nonsupervised learning, classification/function approximation, etc. Here we concentrate on its application to supervised learning in function approximation problems. In this context, Jordan and Jacobs (1994) proposed to use the EM algorithm to train the mixture of experts architecture for regression problems. The EM algorithm has been also applied to estimate the input/output joint pdf, modeled through a Gaussian mixture model, and then estimating the regressor as the conditional pdf (Ghahramani & Jordan, 1994). In both cases the missing data select the most likely member of the mixture given the observations, and then each member is trained independently.

More recently, the EM algorithm has been applied for efficient training of feedforward and recurrent networks (Ma and Ji, 1998, Ma et al., 1997). The work in Ma et al. (1997) connects to the previous work of Feder and Weinstein (1988) for estimating superimposed signals in noise. In both methods, the E-step reduces to decompose at each iteration the total residual into G components (G being the number of neurons). In Feder and Weinstein (1988), the variables used to decompose the residual can be arbitrary values, as long as they sum one, but constant over the function domain: for instance, they propose to decompose the residual into G equal components. In Ma et al. (1997), the residual is decomposed proportionally to the weights of the output layer. Both approaches work well for feedforward networks with global activation functions such as the MLP, but tend to be rather slow for networks with local activation functions since each individual unit is forced to approximate regions far away from the domain of the activation unit.

To overcome this drawback we propose in this paper a new EM algorithm, specific for RBF networks, which aims to accelerate its convergence. We perform a soft decomposition of the residual, taking into account the locality of the basis functions. Different examples show that this modification speeds up the convergence in comparison with previous EM approaches.

The paper is organized as follows. In Section 2, the main features of the EM algorithm are presented. In Section 3, we present some EM-based approaches for the training of feedforward neural networks. In Section 4, the EM algorithm is applied to train an RBF network taking advantage of the local nature of its activation function. Simulation results are provided in Section 5 to validate the proposed algorithm. In Section 6, we apply this algorithm to the small-signal modeling of a MESFET transistor to reproduce the intermodulation behavior. To conclude the paper, in Section 7, the main conclusions are exposed.

Section snippets

The EM algorithm

The EM algorithm (Dempster et al., 1977) is a general method for ML estimation of parameters given incomplete data. The word incomplete indicates that, using this formulation, it is convenient to associate two sets of random variables with the problem, Y and V, only one of which, Y, is directly observable. However, the underlying model is expressed in terms of both Y and Z={Y,V}. In the original formulation of the EM algorithm, Y was called the incomplete data, V the missing data, and the

EM-based training of feedforward networks

In this section we introduce the notation and describe previous work on training two-layer feedforward networks using EM-based approaches. Without loss of generality, let us consider an RBF network with G Gaussian units, which approximates an one-dimensional mapping, $g(x): R→R,$ as follows $g ̃ (x)= ∑ i=1 G λ_{i} o_{i} (x),$ where i indexes the RBF units, λ_i the amplitude, and o_i(x) is the activation function of each unit, which is given by $o_{i} (x)= exp − (x−μ_{i})^{2} 2σ_{i}^{2} .$ Our training problem consists in estimating the

Fast EM training of RBF networks

The decoupling , , which are constant over the whole input space, have provided good results in feedforward neural networks with nonlocal activation functions, such as the Multilayer Perceptron (MLP). However, they are not well suited for networks with local activation functions, such as the RBF. For this type of networks its convergence is slow due to the fact that, using the previous decoupling variables, at each M-step we are trying to fit a Gaussian to a very large region of the input

Experimental results

In this experiment we consider the set of eight 2-D functions used in Cherkassky, Gehring, and Mulier (1996) to compare the performance of several adaptive methods. These functions, which form a suitable test set, are described in Table 1. We use a generalized radial basis function (GRBF) allowing a different variance along each input dimension.

First, we compare the performance of the proposed soft-EM approach with the classical EM alternatives (Feder and Weinstein, 1988, Ma et al., 1997),

Nonlinear small-signal modeling of a MESFET for intermodulation distortion characterization

In this section, a GRBF network trained with the proposed soft-EM procedure is used to reproduce the small-signal intermodulation behavior of a microwave MESFET transistor. Fig. 2 shows the most widely accepted equivalent nonlinear circuit of a MESFET in its saturated region. The predominant nonlinearity in this model is the drain to source current I_ds, which depends on both the drain to source, V_ds, and gate to source, V_gs, voltages. Here we are going to model this static nonlinearity.

Conclusions and future work

The decoupling variables used in the E-step of EM-based learning algorithms can be selected to control the rate of convergence of the algorithm. We have studied in this paper a suitable selection of these variables for feedforward networks with local activation functions (mainly, RBF networks). Specifically, these variables are estimated as the posterior probability of each RBF unit given each pattern of the selected complete data. By means of several simulation examples, it has been shown that

Acknowledgements

This work has been partially supported by the European Community and the Spanish Government through FEDER project 1FD97-1863-C02-01. The authors also thank the reviewers for careful reading the manuscript and for many helpful comments.

References (18)

S. Ma et al.
An efficient EM-based training algorithm for feedforward neural networks
Neural Networks
(1997)
I. Santamarı́a et al.
A nonlinear MESFET model for intermodulation analysis using a generalized radial basis function network
Neurocomputing
(1999)
C. Bishop
Neural networks for pattern recognition
(1997)
D.S. Broomhead et al.
Multivariable functional interpolation and adaptive networks
Complex Systems
(1988)
S. Chen et al.
Orthogonal least squares learning algorithm for radial basis functions
IEEE Transactions on Neural Networks
(1991)
V. Cherkassky et al.
Comparison of adaptive methods for function estimation from samples
IEEE Transactions on Neural Networks
(1996)
A.P. Dempster et al.
Maximum likelihood from incomplete data via the EM algorithm
Journals of Royal Statistics Society B
(1977)
M. Feder et al.
Parameter estimation of superimposed signals using the EM algorithm
IEEE Transactions on Acoustics, Speech and Signal Processing
(1988)
Z. Ghahramani et al.

There are more references available in the full text version of this article.

Cited by (43)

Deep active inference as variational policy gradients
2020, Journal of Mathematical Psychology
Citation Excerpt :
This is important because it provides a different optimization scheme, based on variational message passing, with a biologically plausible process theory, which could be used to fit the neural networks used in this paper. There has been some early work on fitting neural networks with probabilistic and message passing models such as with EM (Lázaro, Santamarıa, & Pantaleón, 2003; Ma & Ji, 1998; Ng & McLachlan, 2004) and Kalman filters (which can be derived as inference on factor graphs) (Haykin, 2004; Sum, Leung, Young, & Kan, 1999). Although these approaches have been largely superseded by optimization methods based on stochastic gradient descent, they show that in principle it is possible to train neural networks using Bayesian inference algorithms.
Active Inference is a theory arising from theoretical neuroscience which casts action and planning as Bayesian inference problems to be solved by minimizing a single quantity — the variational free energy. The theory promises a unifying account of action and perception coupled with a biologically plausible process theory. However, despite these potential advantages, current implementations of Active Inference can only handle small policy and state–spaces and typically require the environmental dynamics to be known. In this paper we propose a novel deep Active Inference algorithm that approximates key densities using deep neural networks as flexible function approximators, which enables our approach to scale to significantly larger and more complex tasks than any before attempted in the literature. We demonstrate our method on a suite of OpenAIGym benchmark tasks and obtain performance comparable with common reinforcement learning baselines. Moreover, our algorithm evokes similarities with maximum-entropy reinforcement learning and the policy gradients algorithm, which reveals interesting connections between the Active Inference framework and reinforcement learning.
Artificial intelligence techniques for small boats detection in radar clutter. Real data validation
2018, Engineering Applications of Artificial Intelligence
Artificial intelligence techniques were applied for detecting small moving targets in maritime clutter environments. Neural detectors are considered to approximate the Neyman–Pearson (NP) in composite hypothesis testing problems. Sub-optimum approaches based on the Constrained Generalized Likelihood Ratio (CGLR) were analysed, and compared to conventional implementations based on Doppler filtering that are designed to filter clutter and improve the Signal-to-Interference Ratio, and Constant False Alarm Rate techniques. The CGLR performance was significantly better at the expense of a high computational cost. As a solution, neural network training sets were designed for approximating the NP detector. The detection of small boats in Gaussian clutter was the defined case study in order to assume the design hypothesis of the conventional solutions and to study their performance under their most favourable conditions. Detection schemes were evaluated using real radar data. Neural solutions based on Second Order Neural Networks provide the best results, being able to approximate the CGLR with a significantly low computational cost compatible with real-time operations.
Performance Evaluation of Photovoltaic Systems by Use of RBF Network Models
2015, IFAC-PapersOnLine
This paper deals with the problem of modeling and performance evaluation of the solar photovoltaic systems, based on the available operation data measurements. The main goal is to find out the most significant cause-effect relationships between the available measured inputs and the output power of the system. The problem is not easy one, because quite often the available measurements are correlated and in addition they have an indirect influence on the final output of the system - the output power. The use of the Radial Basis Function (RBF) network models is proposed in this paper. Two different models have been created. One of them needs direct available measurements for the solar radiation and the solar panel temperature to predict the output power. The other is indirect model, which first predicts the solar panel temperature, based on other measurements and then is used in another model for predicting the output power. It is shown in the paper that the indirect model is suitable for performance evaluation and simulation of a solar photovoltaic system under different climate conditions, in different parts of the country. This helps to optimize the most suitable location of the solar system, in order to produce a larger amount of renewable energy.
A growing and pruning sequential learning algorithm of hyper basis function neural network for function approximation
2013, Neural Networks
Citation Excerpt :
In this paper we focus on on-line structure changing, for class of generalized Gaussian basis function. Learning algorithms for RBF networks spread from supervised approaches based on extended Kalman filter (Simon, 2001; Todorović, Stanković, & Moraga, 2002) or gradient descent methods (Karayiannis, 1999), to unsupervised learning approaches (Lazaro, Santamari, & Pantaleo, 2003) and combined unsupervised–supervised approaches (Schwenker, Kestler, & Palm, 2001). Recently, an interesting learning algorithm called extreme learning machine (ELM) (Huang, Zhu, & Siew, 2006) is proposed for single layered feedforward neural networks; in the case of the RBF networks, ELM randomly assigns RBF parameters and analytically determines network weights.
Radial basis function (RBF) neural network is constructed of certain number of RBF neurons, and these networks are among the most used neural networks for modeling of various nonlinear problems in engineering. Conventional RBF neuron is usually based on Gaussian type of activation function with single width for each activation function. This feature restricts neuron performance for modeling the complex nonlinear problems. To accommodate limitation of a single scale, this paper presents neural network with similar but yet different activation function—hyper basis function (HBF). The HBF allows different scaling of input dimensions to provide better generalization property when dealing with complex nonlinear problems in engineering practice. The HBF is based on generalization of Gaussian type of neuron that applies Mahalanobis-like distance as a distance metrics between input training sample and prototype vector. Compared to the RBF, the HBF neuron has more parameters to optimize, but HBF neural network needs less number of HBF neurons to memorize relationship between input and output sets in order to achieve good generalization property. However, recent research results of HBF neural network performance have shown that optimal way of constructing this type of neural network is needed; this paper addresses this issue and modifies sequential learning algorithm for HBF neural network that exploits the concept of neuron’s significance and allows growing and pruning of HBF neuron during learning process. Extensive experimental study shows that HBF neural network, trained with developed learning algorithm, achieves lower prediction error and more compact neural network.
A Radial Basis Function network training algorithm using a non-symmetric partition of the input space - Application to a Model Predictive Control configuration
2011, Advances in Engineering Software
Citation Excerpt :
Their algorithm is intended to identify global features of an input–output relationship before adding local detail to the approximating function. As an alternative to these gradient-based procedures, but still calculating the network parameters in one step, Lazaro et al. [10] used the Expectation–Maximization algorithm to obtain maximum likelihood estimates of the RBF network parameters. The second approach for training an RBF network is to separate the problem of identifying the network parameters in two steps: The first step aims at finding the number and locations of the hidden node RBF centers, while in the second step the synaptic weights are determined.
This work presents the non-symmetric fuzzy means algorithm which is a new methodology for training Radial Basis Function neural network models. The method is based on a non-symmetric fuzzy partition of the space of input variables which results to networks with smaller structures and better approximation capabilities compared to other state-of-the-art training procedures. The lower modeling error and the smaller size of the produced models become particularly important when they are used in online applications. This is demonstrated by integrating the model produced by the proposed algorithm in a Model Predictive Control configuration, resulting in better control performance and shorter computational times.
A study on the use of imputation methods for experimentation with Radial Basis Function Network classifiers handling missing attribute values: The good synergy between RBFNs and EventCovering method
2010, Neural Networks
Citation Excerpt :
Their use in the literature is extensive, and its application varies from face recognition (Er, Wu, Lu, & Hock-Lye, 2002) to time series prediction (Harpham & Dawson, 2006). The RBFNs are under continuously research, so we can find abundant literature about extensions and improvements of RBFNs learning and modeling (Billings, Wei, & Balikhin, 2007; Ghodsi & Schuurmans, 2003; Lázaro, Santamaría, & Pantaleón, 2003; Wallace, Tsapatsoulis, & Kollias, 2005; Wei & Amari, 2008). Recently, we can find some work analyzing the behavior of RBFNs (Eickhoff & Ruckert, 2007; Liao, Fang, & Nuttle, 2003; Yeung, Ng, Wang, Tsang, & Wang, 2007) and improving their efficiency (Arenas-Garcia, Gomez-Verdejo, & Figueiras-Vidal, 2007; Schwenker, Kestler, & Palm, 2001).
The presence of Missing Values in a data set can affect the performance of a classifier constructed using that data set as a training sample. Several methods have been proposed to treat missing data and the one used more frequently is the imputation of the Missing Values of an instance.
In this paper, we analyze the improvement of performance on Radial Basis Function Networks by means of the use of several imputation methods in the classification task with missing values. The study has been conducted using data sets with real Missing Values, and data sets with artificial Missing Values. The results obtained show that EventCovering offers a very good synergy with Radial Basis Function Networks. It allows us to overcome the negative impact of the presence of Missing Values to a certain degree.

View all citing articles on Scopus

View full text

A new EM-based training algorithm for RBF networks

Abstract

Introduction

Section snippets

The EM algorithm

EM-based training of feedforward networks

Fast EM training of RBF networks

Experimental results

Nonlinear small-signal modeling of a MESFET for intermodulation distortion characterization

Conclusions and future work

Acknowledgements

Neural Networks

Neurocomputing

Neural networks for pattern recognition

Multivariable functional interpolation and adaptive networks

Complex Systems

Orthogonal least squares learning algorithm for radial basis functions

IEEE Transactions on Neural Networks

Comparison of adaptive methods for function estimation from samples

IEEE Transactions on Neural Networks

Maximum likelihood from incomplete data via the EM algorithm

Journals of Royal Statistics Society B

Parameter estimation of superimposed signals using the EM algorithm

IEEE Transactions on Acoustics, Speech and Signal Processing