Elsevier

Neural Networks

Volume 16, Issue 1, January 2003, Pages 69-77
Neural Networks

A new EM-based training algorithm for RBF networks

https://doi.org/10.1016/S0893-6080(02)00215-0Get rights and content

Abstract

In this paper, we propose a new Expectation–Maximization (EM) algorithm which speeds up the training of feedforward networks with local activation functions such as the Radial Basis Function (RBF) network. In previously proposed approaches, at each E-step the residual is decomposed equally among the units or proportionally to the weights of the output layer. However, these approaches tend to slow down the training of networks with local activation units. To overcome this drawback in this paper we use a new E-step which applies a soft decomposition of the residual among the units. In particular, the decoupling variables are estimated as the posterior probability of a component given an input–output pattern. This adaptive decomposition takes into account the local nature of the activation function and, by allowing the RBF units to focus on different subregions of the input space, the convergence is improved. The proposed EM training algorithm has been applied to the nonlinear modeling of a MESFET transistor.

Introduction

Radial Basis Function (RBF) networks have become one of the most popular feedforward neural networks with applications in regression, classification and function approximation problems (Bishop, 1997, Haykin, 1994). The RBF network approximates nonlinear mappings by weighted sums of Gaussian kernels. Therefore, an RBF learning algorithm must estimate the centers of the units, their variances and the weights of the output layer. Typically, the learning process is separated into two steps: first, a nonlinear optimization procedure to select the centers and the variances and, second, a linear optimization step to fix the output weights. To simplify the nonlinear optimization step, the variances are usually fixed in advance and the centers are selected at random (Broomhead & Lowe, 1988) or applying a clustering algorithm (Moody & Darken, 1989).

Other approaches try to solve the global nonlinear optimization problem using supervised (gradient-based) procedures to estimate the network parameters, which minimize the mean square error (MSE) between the desired output and the output of the network (Karayanis, 1997, Lowe, 1989, Santamarı́a et al., 1999). However, gradient descent techniques tend to be computationally complex and suffer from local minima.

As an alternative to global optimization procedures, a general and powerful method such as the Expectation–Maximization (EM) algorithm (Dempster, Laird, &, Rubin, 1977) can be applied to obtain maximum likelihood (ML) estimates of the network parameters. In the neural networks literature, the EM algorithm has been applied in a number of problems: supervised/nonsupervised learning, classification/function approximation, etc. Here we concentrate on its application to supervised learning in function approximation problems. In this context, Jordan and Jacobs (1994) proposed to use the EM algorithm to train the mixture of experts architecture for regression problems. The EM algorithm has been also applied to estimate the input/output joint pdf, modeled through a Gaussian mixture model, and then estimating the regressor as the conditional pdf (Ghahramani & Jordan, 1994). In both cases the missing data select the most likely member of the mixture given the observations, and then each member is trained independently.

More recently, the EM algorithm has been applied for efficient training of feedforward and recurrent networks (Ma and Ji, 1998, Ma et al., 1997). The work in Ma et al. (1997) connects to the previous work of Feder and Weinstein (1988) for estimating superimposed signals in noise. In both methods, the E-step reduces to decompose at each iteration the total residual into G components (G being the number of neurons). In Feder and Weinstein (1988), the variables used to decompose the residual can be arbitrary values, as long as they sum one, but constant over the function domain: for instance, they propose to decompose the residual into G equal components. In Ma et al. (1997), the residual is decomposed proportionally to the weights of the output layer. Both approaches work well for feedforward networks with global activation functions such as the MLP, but tend to be rather slow for networks with local activation functions since each individual unit is forced to approximate regions far away from the domain of the activation unit.

To overcome this drawback we propose in this paper a new EM algorithm, specific for RBF networks, which aims to accelerate its convergence. We perform a soft decomposition of the residual, taking into account the locality of the basis functions. Different examples show that this modification speeds up the convergence in comparison with previous EM approaches.

The paper is organized as follows. In Section 2, the main features of the EM algorithm are presented. In Section 3, we present some EM-based approaches for the training of feedforward neural networks. In Section 4, the EM algorithm is applied to train an RBF network taking advantage of the local nature of its activation function. Simulation results are provided in Section 5 to validate the proposed algorithm. In Section 6, we apply this algorithm to the small-signal modeling of a MESFET transistor to reproduce the intermodulation behavior. To conclude the paper, in Section 7, the main conclusions are exposed.

Section snippets

The EM algorithm

The EM algorithm (Dempster et al., 1977) is a general method for ML estimation of parameters given incomplete data. The word incomplete indicates that, using this formulation, it is convenient to associate two sets of random variables with the problem, Y and V, only one of which, Y, is directly observable. However, the underlying model is expressed in terms of both Y and Z={Y,V}. In the original formulation of the EM algorithm, Y was called the incomplete data, V the missing data, and the

EM-based training of feedforward networks

In this section we introduce the notation and describe previous work on training two-layer feedforward networks using EM-based approaches. Without loss of generality, let us consider an RBF network with G Gaussian units, which approximates an one-dimensional mapping, g(x):R→R, as followsg̃(x)=i=1Gλioi(x),where i indexes the RBF units, λi the amplitude, and oi(x) is the activation function of each unit, which is given byoi(x)=exp(x−μi)2i2.Our training problem consists in estimating the

Fast EM training of RBF networks

The decoupling , , which are constant over the whole input space, have provided good results in feedforward neural networks with nonlocal activation functions, such as the Multilayer Perceptron (MLP). However, they are not well suited for networks with local activation functions, such as the RBF. For this type of networks its convergence is slow due to the fact that, using the previous decoupling variables, at each M-step we are trying to fit a Gaussian to a very large region of the input

Experimental results

In this experiment we consider the set of eight 2-D functions used in Cherkassky, Gehring, and Mulier (1996) to compare the performance of several adaptive methods. These functions, which form a suitable test set, are described in Table 1. We use a generalized radial basis function (GRBF) allowing a different variance along each input dimension.

First, we compare the performance of the proposed soft-EM approach with the classical EM alternatives (Feder and Weinstein, 1988, Ma et al., 1997),

Nonlinear small-signal modeling of a MESFET for intermodulation distortion characterization

In this section, a GRBF network trained with the proposed soft-EM procedure is used to reproduce the small-signal intermodulation behavior of a microwave MESFET transistor. Fig. 2 shows the most widely accepted equivalent nonlinear circuit of a MESFET in its saturated region. The predominant nonlinearity in this model is the drain to source current Ids, which depends on both the drain to source, Vds, and gate to source, Vgs, voltages. Here we are going to model this static nonlinearity.

Conclusions and future work

The decoupling variables used in the E-step of EM-based learning algorithms can be selected to control the rate of convergence of the algorithm. We have studied in this paper a suitable selection of these variables for feedforward networks with local activation functions (mainly, RBF networks). Specifically, these variables are estimated as the posterior probability of each RBF unit given each pattern of the selected complete data. By means of several simulation examples, it has been shown that

Acknowledgements

This work has been partially supported by the European Community and the Spanish Government through FEDER project 1FD97-1863-C02-01. The authors also thank the reviewers for careful reading the manuscript and for many helpful comments.

References (18)

  • S. Ma et al.

    An efficient EM-based training algorithm for feedforward neural networks

    Neural Networks

    (1997)
  • I. Santamarı́a et al.

    A nonlinear MESFET model for intermodulation analysis using a generalized radial basis function network

    Neurocomputing

    (1999)
  • C. Bishop

    Neural networks for pattern recognition

    (1997)
  • D.S. Broomhead et al.

    Multivariable functional interpolation and adaptive networks

    Complex Systems

    (1988)
  • S. Chen et al.

    Orthogonal least squares learning algorithm for radial basis functions

    IEEE Transactions on Neural Networks

    (1991)
  • V. Cherkassky et al.

    Comparison of adaptive methods for function estimation from samples

    IEEE Transactions on Neural Networks

    (1996)
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    Journals of Royal Statistics Society B

    (1977)
  • M. Feder et al.

    Parameter estimation of superimposed signals using the EM algorithm

    IEEE Transactions on Acoustics, Speech and Signal Processing

    (1988)
  • Z. Ghahramani et al.
There are more references available in the full text version of this article.

Cited by (43)

  • Deep active inference as variational policy gradients

    2020, Journal of Mathematical Psychology
    Citation Excerpt :

    This is important because it provides a different optimization scheme, based on variational message passing, with a biologically plausible process theory, which could be used to fit the neural networks used in this paper. There has been some early work on fitting neural networks with probabilistic and message passing models such as with EM (Lázaro, Santamarıa, & Pantaleón, 2003; Ma & Ji, 1998; Ng & McLachlan, 2004) and Kalman filters (which can be derived as inference on factor graphs) (Haykin, 2004; Sum, Leung, Young, & Kan, 1999). Although these approaches have been largely superseded by optimization methods based on stochastic gradient descent, they show that in principle it is possible to train neural networks using Bayesian inference algorithms.

  • A growing and pruning sequential learning algorithm of hyper basis function neural network for function approximation

    2013, Neural Networks
    Citation Excerpt :

    In this paper we focus on on-line structure changing, for class of generalized Gaussian basis function. Learning algorithms for RBF networks spread from supervised approaches based on extended Kalman filter (Simon, 2001; Todorović, Stanković, & Moraga, 2002) or gradient descent methods (Karayiannis, 1999), to unsupervised learning approaches (Lazaro, Santamari, & Pantaleo, 2003) and combined unsupervised–supervised approaches (Schwenker, Kestler, & Palm, 2001). Recently, an interesting learning algorithm called extreme learning machine (ELM) (Huang, Zhu, & Siew, 2006) is proposed for single layered feedforward neural networks; in the case of the RBF networks, ELM randomly assigns RBF parameters and analytically determines network weights.

  • A Radial Basis Function network training algorithm using a non-symmetric partition of the input space - Application to a Model Predictive Control configuration

    2011, Advances in Engineering Software
    Citation Excerpt :

    Their algorithm is intended to identify global features of an input–output relationship before adding local detail to the approximating function. As an alternative to these gradient-based procedures, but still calculating the network parameters in one step, Lazaro et al. [10] used the Expectation–Maximization algorithm to obtain maximum likelihood estimates of the RBF network parameters. The second approach for training an RBF network is to separate the problem of identifying the network parameters in two steps: The first step aims at finding the number and locations of the hidden node RBF centers, while in the second step the synaptic weights are determined.

  • A study on the use of imputation methods for experimentation with Radial Basis Function Network classifiers handling missing attribute values: The good synergy between RBFNs and EventCovering method

    2010, Neural Networks
    Citation Excerpt :

    Their use in the literature is extensive, and its application varies from face recognition (Er, Wu, Lu, & Hock-Lye, 2002) to time series prediction (Harpham & Dawson, 2006). The RBFNs are under continuously research, so we can find abundant literature about extensions and improvements of RBFNs learning and modeling (Billings, Wei, & Balikhin, 2007; Ghodsi & Schuurmans, 2003; Lázaro, Santamaría, & Pantaleón, 2003; Wallace, Tsapatsoulis, & Kollias, 2005; Wei & Amari, 2008). Recently, we can find some work analyzing the behavior of RBFNs (Eickhoff & Ruckert, 2007; Liao, Fang, & Nuttle, 2003; Yeung, Ng, Wang, Tsang, & Wang, 2007) and improving their efficiency (Arenas-Garcia, Gomez-Verdejo, & Figueiras-Vidal, 2007; Schwenker, Kestler, & Palm, 2001).

View all citing articles on Scopus
View full text