On robustness of radial basis function network with input perturbation

Dey, Prasenjit; Gopal, Madhumita; Pradhan, Payal; Pal, Tandra

doi:10.1007/s00521-017-3086-5

On robustness of radial basis function network with input perturbation

Original Article
Published: 30 June 2017

Volume 31, pages 523–537, (2019)
Cite this article

Download PDF

Neural Computing and Applications Aims and scope Submit manuscript

On robustness of radial basis function network with input perturbation

Download PDF

Prasenjit Dey ORCID: orcid.org/0000-0003-2279-9178¹,
Madhumita Gopal¹,
Payal Pradhan¹ &
…
Tandra Pal¹

403 Accesses
13 Citations
Explore all metrics

Abstract

In this article, we have proposed a methodology for making a radial basis function network (RBFN) robust with respect to additive and multiplicative input noises. This is achieved by properly selecting the centers and widths for the radial basis function (RBF) units of the hidden layer. For this purpose, firstly, a set of self-organizing map (SOM) networks are trained for center selection. For training a SOM network, random Gaussian noise is injected in the samples of each class of the data set. The number of SOM networks is same as the number of classes present in the data set, and each of the SOM networks is trained separately by the samples belonging to a particular class. The weight vector associated with a unit in the output layer of a particular SOM network corresponding to a class is used as the center of a RBF unit for that class. To determine the widths of the RBF units, p-nearest neighbor algorithm is used class-wise. Proper selection of centers and widths makes the RBFN robust with respect to input perturbation and outliers present in the data set. The weights between the hidden and output layers of RBFN are obtained by pseudo inverse method. To test the robustness of the proposed method in additive and multiplicative noise scenarios, ten standard data sets have been used for classification. Proposed method has been compared with three existing methods, where the centers have been generated in three ways: randomly, using k-means algorithm, and based on SOM network. Simulation results show the superiority of the proposed method compared to those methods. Wilcoxon signed-rank test also shows that the proposed method is statistically better than those methods.

Robust Training of Radial Basis Function Neural Networks

A Parsimonious Radial Basis Function-Based Neural Network for Data Classification

Sensitivity Analysis of Radial-Basis Function Neural Network due to the Errors of the I.I.D Input

1 Introduction

Radial basis function network (RBFN), a popular artificial neural network (ANN), has been used in many applications [1, 2]. However, like other neural network models, it is not free from input perturbation. It may encounter input perturbation when applied to real life applications, e.g., the inputs, which come from electronic sensors, such as microphones, termopars, may be altered [3]. This alteration can be additive [4] or multiplicative [3]. When the perturbation of an input is proportional to its magnitude, it is called multiplicative perturbation, and when perturbation is done with an additive Gaussian noise, it is called as additive perturbation. Conventional RBFN is sensitive to changes in input [4]. It can be made robust to its input perturbation by proper selection of RBF parameter values [5, 6].

The dimensionality of the hidden layer is an important parameter to make a RBFN efficient. If the number of RBF units of the hidden layer is not sufficient, it may cause underfitting [7]. According to Cover’s theorem [8], the number of units in the hidden layer should be more than the number of features of the input pattern to avoid underfitting. So, we have used $n \times C$ number of RBF units in the hidden layer, where n and C are, respectively, number of features and number of classes present in the data set. It makes the number of RBF units C times more than the number of features. The choice of $n \times C$ for the number of RBF units will make the distribution of the data representing a class uniform among the RBF units.

In this article, we have proposed a RBFN which is robust with respect to the alteration of input. Here, self-organizing map (SOM) network-based clustering algorithm has been used to select the centers of the RBF units in the hidden layer of RBFN. A SOM network has been trained using the samples belonging to a particular class of the data set and C number of such SOM networks have been trained separately for C classes. For training a SOM network, independent random Gaussian noise has been injected in the input samples of the corresponding class. We consider n number of units in the output layer of SOM which generates n weight vectors from each of the trained SOM networks. These weight vectors are used as n cluster centers for a particular class. So, for each class, we have a set of n centers corresponding to n RBF units. We select all the cluster centers corresponding to C classes. Thus, for C classes, the number of RBF units in the hidden layer is $n \times C$. For selection of widths of the RBF units, p-nearest neighbor (p-NN) algorithm has been modified and used class-wise. Finally, pseudo inverse method has been used to select the set of weight vectors between hidden and output layers of RBFN. The proposed method has been tested for robustness on ten standard classification data sets. The experimental results have established the significance of the proposed method to make the network robust in terms of multiplicative and additive input noises.

The remaining of this article is organized as follows. In Sect. 2, RBFN and SOM network are described in brief. Some of the works existing in the literature associated with robustness and generalization ability of the RBFN are reviewed in Sect. 3. In Sect. 4, we illustrate the proposed method. The philosophy behind the proposed method is also discussed in this section. The simulation results and comparison with other existing methods are provided in Sect. 5. Finally, we draw the conclusions in Sect. 6.

2 The preliminaries

2.1 Fundamental of RBF network

A generic model of a RBFN with three layers is presented here. We consider a data set ${\mathcal {D}} = \{({\mathbf {x}}^p,{\mathbf {d}}^p)\}^{P}_{p=1}$, where the input pattern ${\mathbf {x}}^p \in \mathbb {R}^n$ and the corresponding desired output ${\mathbf {d}}^p \in \mathbb {R}^J$. P, n, and J are, respectively, the number of patterns present in the data set, the number of features present in a pattern, and the number of output. The RBFN consists of n linear nodes in the input layer. Each node of the hidden layer of RBFN is called RBF unit. Each RBFN unit is associated with a local receptive field and each receptive field is associated with a center and a width. The dimension of the center is same as the input dimension, i.e., n. Widths are the scaling factors for the output of the RBF units. The output layer consists of J linear output nodes. For an input pattern ${\mathbf {x}}^p = \left( x^p_{1}, x^p_{2}, \ldots , x^p_{n}\right)$, the RBFN with radial basis activation function (kernel basis function) in hidden layer produces an output ${\mathbf {z}}^p = \left( z^p_{1}, z^p_{2}, \ldots , z^p_{H}\right)$. Here, H is the number of RBF units in RBFN. A linear weighted summation of the outputs of hidden units produces the approximated output ${\mathbf {o}}^p = \left( o^p_{1}, o^p_{2}, \ldots , o^p_{J}\right)$ of RBFN. The basic architecture of the RBF network is shown in Fig. 1. The function of RBFN is described in the following three steps.

1.
Each input $x^p_i$ in the input layer is scaled by the weight $w_{ih}$ as given below in (1), where $w_{ih}$ is the weight between ith unit of the input layer and hth RBF unit of the hidden layer.
$$\begin{aligned} x^p_{i_h} = x^p_i w_{ih} \end{aligned}$$
(1)
Thus, vector ${\mathbf {x}}^p_h = \left( x^p_{1_h}, x^p_{2_h}, \ldots , x^p_{n_h}\right)$ is the scaled input to the hth RBF unit.
2.
The output of hth RBF unit of hidden layer is given in (2).
$$\begin{aligned} z^{p}_{h} = \varphi \left( x^{p}_h, \mathbf {c}_{h}, \sigma _{h}\right) = \exp \left( -\frac{||{\mathbf {x}}^p_h - \mathbf {c}_h||^{2}}{\sigma _h^{2}}\right) \end{aligned}$$
(2)
Here, $\varphi \left( \cdot \right)$ is the kernel basis function and $||\cdot ||$ is the $L^2$ norm of the operand vectors. $\mathbf {c}_h$ and $\sigma _h$ are, respectively, center and width of the hth RBF unit.
3.
The network output for the input pattern ${\mathbf {x}}^p$ is the sum of weighted outputs of RBF units, as follows in (3).
$$o^p_j = \sum _{h=1}^{H}\varphi ^p_h w_{hj} + w_0$$
(3)
Here, $w_{hj}$ represents the weight between hth RBF unit and jth output unit, and $w_0$ is the weight of the bias unit of the hidden layer.

2.2 Fundamental of SOM network

SOM network consists of two layers: input layer and output layer. n number of features, present in the input pattern ${\mathbf {x}}^p = \left( x^p_1, x^p_2, \ldots , x^p_{n}\right)$, are fed in the input layer of SOM network. Thus, the input layer of the SOM network consists of n linear nodes. The output layer of SOM network, in general, is a multi-dimensional lattice. In this study, we consider a one-dimensional lattice/grid of size m. Each unit in the output layer is denoted by $u_j (j = 1, 2, \ldots , m)$. The input layer is fully connected with the output layer. The connection between the ith unit of the input layer and the jth unit of the output layer is associated with a weight $w_{ij}$. Figure 2 illustrates the basic architecture of SOM network.

A weight vector corresponding to the jth output unit is represented by ${\mathbf {w}}_j = [w_{1j}, w_{2j}, \ldots , w_{nj}]$. A similarity measure $\texttt {d}_j$ between the input pattern ${\mathbf {x}}^p$ and the weight vector ${\mathbf {w}}_j$ for the output unit j is usually computed as $\texttt {d}_j = ||{\mathbf {x}}^p-{\mathbf {w}}_j||$. The output unit corresponding to the minimum among all $\texttt {d}_j$s values is the winner for the corresponding input pattern ${\mathbf {x}}^p$. The weights of the winner along with the weights of the units in its topological neighborhood are updated in the direction of the input pattern. The influence of this competition becomes exponentially less with distance from the winner which is at the center of the topological neighborhood. The neighborhood function is generally used as the Gaussian function, presented in (4).

$$G(u_j, \hat{u}, \sigma ) = \exp \left( -\frac{||u_j-\hat{u}||}{2\sigma ^2}\right)$$

(4)

Here, $G\left( \cdot \right)$ is the topological neighborhood function, $\hat{u}$ and u_j are respectively the winning unit and jth unit of the output layer of SOM network, and $\sigma$ is the width of the neighborhood function. The weight updating formula is given in (5).

$$\begin{aligned} {\mathbf {w}}_j^{t+1} = {\mathbf {w}}_j^{t} + \eta ^{t} G(u_j, \hat{u}, \sigma ) ||{\mathbf {x}}-{\mathbf {w}}_j^{t}|| \end{aligned}$$

(5)

Here $\eta ^t$ and ${\mathbf {w}}_j^{t}$ are, respectively, the learning rate and the weight at time t. With repeated presentations of the training patterns, weight vectors tend to move toward the input patterns due to updation of weight. The weight updation of the neighbors restores the topology of the input patterns in the output layer.

3 The literature

Proper selection of the centers and widths of the basis functions associated with the RBF units plays an significant role in robustness and generalization of the RBFN. In literature, many notable works have dealt with the parameter selection of RBFN to improve the generalization [9,10,11,12,13,14,15,16,17,18,19,20,21] and robustness [3, 4, 7, 22,23,24,25,26,27,28,29] of the network. In this section, we have discussed some of such related works on RBFN in brief.

Moody et al. [9], have used k-means clusterings and p-NN heuristic to compute, respectively, the centers and widths of the receptive fields in the hidden layer. p-NN-based width selection method helps to achieve overlapping kernel functions between each of the RBF units and its neighbors. Thus, the RBF units form smooth and contiguity interpolation over the input space. They have shown that centers and widths of RBF units control the generalization abilities of the RBFN. In [10], the author optimized the RBF centers via the expectation maximization algorithm, where the centers have been initialized by the centroids obtained from clustering algorithm. In [11], Chen et al. have proposed orthogonal least square (OLS) method to obtain the optimal centers for RBF units. For the purpose, firstly the author has transformed the regressors, i.e., fixed function of training vectors into orthogonal basis vectors (OBVs). Later, by using forward regression method a subset of this OBVs were selected as centers of the RBF units. The authors, in [12], have used genetic algorithm for the selection of centers and widths of RBFN. Though their algorithm is robust with respect to local minima for global population-based search technique, it becomes slow when the search space is large. Support vectors, generated by support vector machine (SVM), have used as centers of RBF units in [13]. Simulation results have shown that their method outperforms k-means clustering-based RBFN. For RBFN, hierarchical full Bayesian approach has been proposed in [22]. In the computation of joint distribution of the RBF parameters and the number of basis functions, the authors have considered the following: (i) Markov chain Monte Carlo (MCMC) algorithm when the number of RBF units is known and (ii) reversible-jump MCMC algorithm when the number of RBF units is unknown. They have proved the convergence of the algorithms and show that those are robust against the specification of prior. K. Mao [14] has selected RBF centers based on the class separability obtained by Fisher ratio. Here, Fisher ratio has been incorporated into an orthogonal transform and a forward selection procedure to select RBF centers. To evaluate the class separability provided by each RBF units independently, they have decoupled the correlations among the outputs of RBF units. The empirical results demonstrate that their algorithm obtained a reduced set of RBF units which increases the class separation. The authors in [15] reduced the hidden layer space while preserving the major data structure of it. They have preserved the relative locations of the samples of the input space including those samples which are close to the boundaries of different classes. It is achieved by the centers of the basis functions. Fisher’s class separation criteria have been used for the selection of widths of the basis functions. Simulations results demonstrate that their method improves generalization of the RBFN thought training of the RBFN is computationally expensive due to the involvement of repeated singular value decompositions.

Webb [4] has used a noise dependent regularizer with sum of squared error function to make the function approximation robust in presence of noise in test patterns. Given pth data pattern $x^p$, let $E\left( x^p\right)$ be the error between the approximation $f(x^p)$ and the desired value $f^p$. Let the pattern $x^p$ is perturbed by additive noise $\aleph$. Assuming the value of $\aleph$ small, by Taylor’s theorem the expectation of the error in presence of noise for $[\aleph ]$ = 0 is given in (6).

$$[E^p] = E(X^p) + \frac{1}{2}[\aleph ^* H^p \aleph ]$$

(6)

$E(X^p)$ is the error in the absence of noise and $\frac{1}{2} [\aleph ^{*}H^p\aleph ]$ is the additional error term, where $H^p$ is the Hessian matrix with respect to the input and evaluated for the pth pattern as given below in (7).

$$H^p = \frac{\partial ^2E}{\partial x^i \partial x^j}\biggm |_{x^p}$$

(7)

For $[\aleph _i\aleph _j] = \sigma ^2\delta$ the additional error term is

$$\frac{1}{2} [\aleph ^{*}H^p\aleph ] = \frac{\sigma ^2}{2}T_r(H^p),$$

(8)

where $T_r$ is the matrix trace operation. Averaging over all data patterns gives the mean expected error as in (9).

$$[E_T] \triangleq \frac{1}{P}\sum _{p=1}^{P}[E^p] = \frac{1}{P}\sum _{p=1}^{P}E(X^p) + \frac{\sigma ^2}{2P}\sum _{p=1}^{P}T_r(H^p)$$

(9)

Orr [16] has used a weight decay regularizer-based forward selection method along with cross validation method to select the centers of RBF units to improve the generalization of RBFN. A gradient descent-based learning method was used to fine tune centers and widths of the RBF units in [17]. Tuning was applied after evaluating the parameters by a clustering algorithm and pseudo matrix inversion. In [18], the authors have proposed a three phase learning algorithm for RBFN. In first phase centers and widths of the RBF units are determined. The weights between hidden and output layers are determined in the second phase. The third phase is the gradient descent-based error back propagation, used to fine tune all the RBF parameters. They have shown that the performance of a two phase learning algorithm of RBFN has been improved by adding another phase. In [30], for the selection of parameters of RBFN, the authors have minimized localized generalization error bound for unseen patterns. For this purpose, the objective function, given below in (10), was minimized. The objective function was used to find an upper bound of MSE for unknown patterns in q neighborhood of the training patterns.

$$\text {min} \ \mathcal {R}_q = \sqrt{\mathcal {R}_\mathrm{emp}} + \sqrt{E_q\left( \Delta Y_p\right) ^2}$$

(10)

Here $\mathcal {R}_\mathrm{emp} = \frac{1}{P}\sum _{p=1}^{P}\left( {\mathbf {o}}^p-{\mathbf {d}}^p\right) ^2$ and $\Delta Y_p=\left( {\mathbf {o}}-{\mathbf {o}}^p\right) ^2$. Here, ${\mathbf {o}}^p$ and ${\mathbf {d}}^p$ are, respectively, obtained and desired output of the pth pattern. In case of center and width selection, there method outperforms RBF networks trained by minimizing training error only. Based on the Gladyshev theorem, Ho et al. [27] have proved the convergence of many online training algorithms like additive noise injection in input, additive/multiplicative noise injection in weight, etc. They have shown that the objective function obtained by additive input noise injection during training time is equivalent to Tikhonov regularization-based approach.

Fritzke has used a growing cell self-organizing map in [19] for the selection of center of RBF units. Their network adds a new unit whenever an input pattern is sufficiently away from all existing RBF centers. It helps to find better positions of centers for the RBF units with respect to generalization. Another variant of SOM called Probabilistic self-organizing map has been used in [20] for the parameter selection in RBFN to reduce the root-mean-square error. In [21], the authors have used SOM-based clustering and k-means algorithm for the selection of centers of RBFN to equalize time varying nonlinear channels in a satellite UMTS channel. Empirically they have shown that the performance is better in case of SOM-based clustering. In [28], the authors performed forward selection and SOM-based clustering for the selection of centers of RBF units. Using simulations, they have shown that RBFN trained by SOM is less sensitive to the noise present in the input compared to Forward selection method. In [31], the authors have used incremental SOM to find the centers of RBF units in the hidden layer to improve the classification accuracy of the corresponding RBFN. Here, if any new samples are added in the network, the topology of the SOM network is modified by generating new cluster centers.

Initially the effect of input perturbation and weight perturbation on RBFN output is analyzed in [23]. The authors have determined output covariance matrix of an RBFN by Taylor series expansion and measured the sensitivity of the output in presence of perturbation. In [24], the authors have studied the importance of input perturbation using a wavelet adaptive neural network. The authors have estimated the sensitivity on input as the ratio between the standard deviation of the prediction and the altered input and shown that the sensitivity is highly correlated with the input perturbation. Lee et al., in [25], have proposed a RBFN, which is robust with respect to those functions which have constant value over a interval of time. They have shown that the proposed RBFN is also robust against the outliers. The activation function in their RBFN is a constant valued function which is composed of a set of sigmoid functions. The new activation function was used with a robust objective function, given below in (11).

$$\begin{aligned} E_R(e_p) = \sum _{p=1}^{P}||\phi (e_p)-\phi (0)|| \end{aligned}$$

(11)

Here $e_p = t_p - f(x_p)$ is the error for the pth training sample with desired output $t_p$, $\phi (e_p)$ is a continuous function, $\phi (0)$ is constant, and P is the total number of input samples. Note that, $E_R(e_p)$ is the least square criterion when $\phi (e_p)=e^2$ and $\phi (0)=0$. Bruzzone et al. [26] have used k-means algorithm for class-wise selection of centers of the receptive fields. For selection of widths of the receptive fields they have used p-NN algorithm only where the number of centers belong to the class is more than p, otherwise standard deviation, computed over all training samples belonging to that class, is used. Bernier et al. [3] have studied the fault tolerance behavior of the RBF network for both additive and multiplicative input perturbation. They [3] have proposed a measure for sensitivity, known as Mean square sensitivity (MSS), which is defined as follows in (12).

$$\begin{aligned} \text {MSS} = \dfrac{1}{2 P} \sum _{p = 1}^{P} \sum _{i = 1}^{N_m} \left( SS_i^m \right) ^2 \end{aligned}$$

(12)

Here $SS_i^m$ is sensitivity of the ith unit in the mth layer of the network. P is the number of input patterns and $N_m$ is the number of neurons in mth layer of RBFN. Using Taylor expansion, they have also defined MSS for additive and multiplicative input noises $(\text {MSS}_\mathrm{in})$ as given below in (13).

$$\begin{aligned} \text {MSS}_\mathrm{in} = \dfrac{1}{2 P} \sum _{p = 1}^{P} \sum _{k = 1}^{N} \dfrac{\partial ^2 \varepsilon (p)}{\partial \left( x^{p}_{k}\right) ^2} \end{aligned}$$

(13)

Here, $\varepsilon (p)$ is instantaneous squared error for the pth input pattern and $x^{p}_{k}$ is the kth feature of the pth input pattern. Low value of MSS indicates less sensitivity of the network against input perturbation. In [29], the authors have proposed an algorithm for sensitivity analysis of RBFN. The centers and the widths of the RBF units are determined by maximizing the output sensitivity of the training pattern. The least number of such hidden RBF units with the maximal sensitivity represents the most generalized RBF network. The authors have defined the output deviation of $\Delta y_j$ as in (14).

$$\begin{aligned} \Delta y_j&= \hat{w}_j \hat{\mathbf{z }} - w_j \mathbf z = \sum _{i=1}^{k}{\hat{w}}_{ij} \exp \left( -\frac{|| {\mathbf {x}} - {\hat{c}}_i||^2}{2{\hat{\sigma} }_i^2}\right) - \sum _{i=1}^{k} w_{ij}\exp \left( -\frac{||{\mathbf {x}} - c_i||^2}{2\sigma _i^2}\right) \end{aligned}$$

(14)

Here, $z_i$ is the response of the ith hidden unit. $\hat{c}_i$ and $\hat{\sigma }_i$ are, respectively, the center and the width of the ith hidden unit in presence of noise. $\hat{c}_i = c_i + \Delta c_i$ is the altered center perturbed by $\Delta c_i$ and the interconnection weight under perturbation $\Delta w_{ij}$ is $\hat{w}_j = w_j + \Delta w_j$. Here $w_j$ has been computed using a pseudo matrix inversion. Robustness of RBFNs with respect to noise in the input is analyzed in [6]. They have determined the upper bounds on the MSE for noisy inputs and network parameters when network parameters are constrained. This parameter constrains make the RBFN robust to noise. Moreover, they have presented a technique to identify high sensitive parameters and inputs in the network. Yo et al. [7] have presented a review on different approaches of designing and training RBFN. They have also proposed a new approach called ErrCor algorithm to find proper size of RBFN and initialization of the centers of RBF units. They have tested the robustness of the algorithm by injecting noise in the test pattern. Simulation results show that it improves robustness of the network.

In [32], the authors have modeled a new gradient-based sequential RBFN. Here, they initially developed a meta-model of RBFN with a small training set. Then, they compute the gradient of the current training set and add the points with maximum gradient to the current training set and rebuild the model. Those new points, added to the training set, help to get the information of highly nonlinear regions which inconsequence improves accuracy in each sequential step. Han et al., in [33], have proposed a variant of RBFN called self-organizing RBF (SORBF) neural network, where the network structure and network parameters both are variable. They used growing and pruning method simultaneously to fix the number of RBF nodes and network weights. They have added or removed a node based on the distance from the radius of the receptive field. Ding et al., in [34], have used genetic algorithm to optimize RBF network and weights. They have used binary encoding for network structure and real encoding for weights between hidden and output layers. Further, they have fine-tuned the weights by pseudo inverse method/least mean square method. Simulations results have shown that their method improve the generalization of RBFN. The author in [35] have proposed a new RBFN called restricted radial basis function network (rRBF) to visualize the patterns in low-dimensional space and perform the classification task. The low-dimensional internal representation of rRBF helps to visualize the structure of the data set. Consequently, it helps to improve the generalization error of the classifier by providing more training data to the region where the classifier lacks visualization of internal representation. They have empirically demonstrated the dimensionality reduction and classification ability of rRBF network.

4 Proposed work

4.1 Mathematical model

RBFN is a local approximation network, where each RBF unit of the hidden layer acts as the local receptive field and the network output is determined by the local receptive fields. Proper selection of the centers and the widths of the kernel basis functions associated with the RBF units is important for making the RBFN robust. If the cluster centers lie in the overlapping regions of different classes, local receptive fields will fail to approximate the class boundaries. Again in case of width selection, we can say that if the width of the basis function is larger than the width of the corresponding class region, it may fail to perform the classification task. In this paper, our prime objective is to find a set of centers and widths for RBF units to make the RBFN robust with respect to input perturbation. The proposed approach is explained below.

We have considered an input output data set ${\mathcal {D}} =\{({\mathbf {x}}_p,{\mathbf {d}}_p)\}^{P}_{p=1}$ for RBFN and partitioned it class-wise. Here, a set of SOM networks has been trained by the samples of the data set. Samples of a particular class have been used to train a particular SOM network. Thus the number of SOM networks is the same as the number of classes present in the data set and the number of units in the output layer of SOM network in the proposed method is also the same as the number of features of the data set. So, output layer of SOM is represented by a one-dimensional lattice of size n, where n is the number of features present in the input sample.

Let ${\mathbf {a}}^1$, ${\mathbf {a}}^2$, $\ldots$, ${\mathbf {a}}^P\,\,\in \,\,R^n$ be P random Gaussian vectors. Before learning the SOM network, the random vectors (noise) are added with the corresponding input patterns ${\mathbf {X}}= ({\mathbf {x}}^1$, ${\mathbf {x}}^2$, $\ldots$, ${\mathbf {x}}^P)$. Random variables in each random vector ${\mathbf {a}}^i = (a^i_1, a^i_2, \ldots , a^i_n)$ are independent and follow zero mean Gaussian distribution with variance v. The probability distribution of noise variance v is given below in (15).

$$\begin{aligned} {\mathcal {P}}({\mathbf {a}}^i) \sim {\mathcal {N}}(0,v) \end{aligned}$$

(15)

Here, ${\mathbf {a}}^i$ and ${\mathbf {a}}^j$ are independent for all $i \ne j$. So the perturbed input of pattern p can be represented by a vector as shown in (16) and (17) corresponding to additive and multiplicative noises, respectively.

$$\begin{aligned} {\mathbf {{\bar{x}}}}_{ai}^{p} = {\mathbf {x}}^p + {\mathbf {a}}^p \end{aligned}$$

(16)

$$\begin{aligned} {\mathbf {{\bar{x}}}}_{mi}^{p} = {\mathbf {x}}^p + {\mathbf {x}}^p{\mathbf {a}}^p \end{aligned}$$

(17)

Then the SOM network is trained using the perturbed input patterns $\mathbf {\bar{X}}$. Let a unit i in input layer is connected to a unit j in the output layer by a weight $w_{ij}$ and ${\mathbf {w}}_j = [w_{1j}, w_{2j}, \ldots , w_{nj}]$ represents the weight vector corresponding to the jth unit in output layer. We take the similarity measure $\text {d}_j$ between the perturbed input ${\mathbf {{\bar{x}}}}^p$ and the weight vector ${\mathbf {w}}_j$ for each unit of the output layer using $L^2$ norm. If the output unit $\hat{u}$ wins the competition then we update only the weights of the winning unit along with the weights of the units in its topological neighborhood in the direction of the input pattern. Weight updation, defined earlier in (5), has now been changed to (18) due to perturbation.

$$\begin{aligned} {\mathbf {w}}_j^{t+1} = {\mathbf {w}}_j^{t} + \eta ^{t} G(u_j, \hat{u}, \sigma ) ||{\mathbf {{\bar{x}}}}-{\mathbf {w}}_j^{t}|| \end{aligned}$$

(18)

Here $\eta ^t$ and ${\mathbf {w}}_j^{t}$ are, respectively, the learning rate and the weights at time t. The neighborhood function $G\left( \cdot \right)$, used in (18), is defined earlier in (4). During first $80\%$ of the training time, we change the learning rate $\eta ^t$ with time as given below in (19)

$$\begin{aligned} \eta ^t = \eta ^0 exp(-t/\tau ), \end{aligned}$$

(19)

where $\eta ^0$ is the initial learning rate and $\tau$ is set to half of the training time. For the remaining $20\%$ of the training time, the learning rate $\eta$ is fixed at 0.05.

The weight vectors from the trained SOM networks are used as the cluster centers for the units of the hidden layer. Each of the SOM networks has same number of weight vectors. It makes the total number of RBF units in the hidden layer same as the number of classes multiplied by the number of features present in the data set. It ensures that each class has equal number of RBF units in the hidden layer.

To compute the values of widths of the RBF units, we have modified p-NN algorithm. In the original p-NN algorithm, p number of neighbors are considered irrespective of their class membership. In the modified algorithm, p is the number of closest neighbors belonging to the same class among n closest neighbors of the corresponding RBF unit are considered. Here, n is the number of centers present in each class. The width is considered as the distance among p-nearest cluster centers scaled by the number of centers present in that class. Let $\sigma _{h}$ be the width for the hth RBF unit, then (20) gives the value of the width $\sigma _{h}$.

$$\begin{aligned} \sigma _{h} = \dfrac{1}{n}\sum _{i=1}^{p}{||\mathbf {c}_i - \mathbf {c}_h||} \end{aligned}$$

(20)

Here, $c_i$ and $c_h$ are, respectively, the centers of the ith and hth RBF units. Once the centers and widths of the RBF units are determined, the weight matrix ${\mathbf {W}}$ between the hidden and output layers of RBFN is obtained by the pseudo inverse method as given below in (21).

$$\begin{aligned} {\mathbf {W}} = {\mathbf {H}}^+ {\mathbf {D}} \end{aligned}$$

(21)

Here, ${\mathbf {H}}$ is the hidden layer output matrix and $^+$ denotes the pseudo inverse operator. ${\mathbf {D}}$ is desired outputs corresponding to ${\mathbf {X}}$ input patterns. Figure 3 illustrates the basic architecture of the proposed method.

Different functions of the proposed algorithm corresponding to different layers are given below.

Input layer:

The computation at the ith neuron of this layer is

$$\begin{aligned} {\mathcal {S}} \left( {\bar{x}}^p_i\right)&= {\bar{x}}^p_i, \quad i = 1, 2, \dots , P_c \end{aligned}$$

(22)

Hidden layer:

The computation at the hth neuron of this layer is

$$\begin{aligned} z_{h}&= ||{\mathbf {{\bar{x}}}}^p - \mathbf {c}_h||, \quad h = 1, 2, \dots , H, \nonumber \\ {\mathcal {G}} \left( z_{h}, \sigma \right)&= \exp \left( -\dfrac{z_h^{2}}{2\sigma ^2}\right) , \quad h = 1, 2, \dots , H. \end{aligned}$$

(23)

Output layer:

The computation at the jth neuron of this layer is

$$\begin{aligned} {\mathcal {L}}\left( {\mathcal {G}}, w_{hj}^{H}\right)&= \sum _{h = 0}^{q} w_{hj}^{H} {\mathcal {G}} \left( z_{h}, \sigma \right) , \quad j = 1, 2, \dots , J \end{aligned}.$$

(24)

Here $P_c$ is the number of patterns available in each class. ${\mathcal {S}}, {\mathcal {G}}$, and ${\mathcal {L}}$ are, respectively, the activation functions of the input, hidden, and output layers. Here, weights of the SOM networks have been trained with noisy input which has been used as the centers of RBF units. To make the widths of the RBF units which are close to class boundaries small, we have scaled it by a factor of n. It makes the width less sensitive to the class separation boundaries.

4.2 Input perturbation scheme for training

During training, For choosing the standard deviation of the perturbation vector (noise) judiciously, we have considered a set of three standard deviations $\Sigma_1 = \{0.05, 0.1, 0.5\}$ for experimentation. Then, we have performed tenfold cross validation on Sonar data set considering those three values of standard deviations and evaluated the performance of the proposed method. We have observed that the proposed method performs best for the standard deviation $\sigma = 0.1$. Thus, we have selected the perturbation vector with zero mean and standard deviation $\sigma =0.1$.

4.3 Philosophy behind the proposed work

Centers of RBF units, if selected using all the training samples together, might give rise to mixed kernel functions. It will generate the clusters mixed and a cluster may not be associated only with the samples of a particular class. It reduces the separability between the classes in the kernel space. p-NN is used for the selection of widths of the RBF units. If the kernel functions are located in the boundary regions between classes, overlapped cluster centers will affect the selection of widths of RBF units in p-NN-based procedure [26]. In this study, cluster centers of the RBF units are generated by SOM networks using the training patterns belonging to the same class. It avoids the generation of mixed kernel functions.

If the clustering is not done class-wise [26], kernel functions may overlap. In that case, use of p-NN for the selection of the widths of RBF units is not a good approach. In this study, we have proposed modified p-NN to keep the width small for the kernel functions located in boundary regions between classes and large for the kernel functions located at the center of the class regions. It avoids overlapping of kernel functions.

Noise injection during clustering by SOM network is a novel approach for the selection of centers of RBF units. Noise injection in the input pattern is equivalent to that of Tikhonov regularizer approach [36], which in turn reduces the wiggling effect [37] in the hyper-spheres of local receptive fields of RBF units. It makes the curvature of the hyper-spheres of the local receptive fields smooth. Due to smoothness of the hyper-spheres, small alteration in the input samples will not make the deviation large in the output. Thus the network will be less sensitive with respect to input noise and outliers present in the input data set, which in turn makes the RBFN robust.

5 Experimentation

5.1 Experimental design

The results, presented in this section, are based on ten standard classification data sets, taken from [38]. The information of all the data sets and their class-wise distribution are shown in Table 1.

To compare the robustness of the proposed method, we have compared it with three different center selection methods existing in the literature: (i) Method I uses class-wise random data point selection [1], (ii) Method II uses class-wise k-means clustering [26] and (iii) Method III uses class-wise SOM-based clustering [28]. For all these three existing methods, we have considered original p-NN-based width selection method. Note that, in our model, we have proposed two methods, one for the center selection and another for width selection. To analyze the performance of only the proposed p-NN-based modified width selection method, we have considered Method IV, where the proposed center selection method and the original p-NN-based width selection method have been used.

In all these five methods including the proposed one, the number of RBF units in the hidden layer is considered same. The simulations are carried out in MATLAB (Version 8.1). We have first applied z-score normalization on the training data set and found the mean and standard deviation, which have been used to perform the z-score normalization on the test data set. Wilcoxon signed-rank test has been used to compare the statistical significance of each pair of algorithms.

Table 1 Summary of the classification data sets

Abstract

Similar content being viewed by others

Robust Training of Radial Basis Function Neural Networks

A Parsimonious Radial Basis Function-Based Neural Network for Data Classification

Sensitivity Analysis of Radial-Basis Function Neural Network due to the Errors of the I.I.D Input

1 Introduction

2 The preliminaries

2.1 Fundamental of RBF network

2.2 Fundamental of SOM network

3 The literature

4 Proposed work

4.1 Mathematical model

Input layer:

Hidden layer:

Output layer:

4.2 Input perturbation scheme for training

4.3 Philosophy behind the proposed work

5 Experimentation

5.1 Experimental design

5.1.1 Parameters settings

5.1.2 Input perturbation scheme for testing

5.2 Experimental results and analysis

5.2.1 Testing with outliers

5.3 Statistical significance: Wilcoxon signed-rank test

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation