Abstract
In this article, we have proposed a methodology for making a radial basis function network (RBFN) robust with respect to additive and multiplicative input noises. This is achieved by properly selecting the centers and widths for the radial basis function (RBF) units of the hidden layer. For this purpose, firstly, a set of self-organizing map (SOM) networks are trained for center selection. For training a SOM network, random Gaussian noise is injected in the samples of each class of the data set. The number of SOM networks is same as the number of classes present in the data set, and each of the SOM networks is trained separately by the samples belonging to a particular class. The weight vector associated with a unit in the output layer of a particular SOM network corresponding to a class is used as the center of a RBF unit for that class. To determine the widths of the RBF units, p-nearest neighbor algorithm is used class-wise. Proper selection of centers and widths makes the RBFN robust with respect to input perturbation and outliers present in the data set. The weights between the hidden and output layers of RBFN are obtained by pseudo inverse method. To test the robustness of the proposed method in additive and multiplicative noise scenarios, ten standard data sets have been used for classification. Proposed method has been compared with three existing methods, where the centers have been generated in three ways: randomly, using k-means algorithm, and based on SOM network. Simulation results show the superiority of the proposed method compared to those methods. Wilcoxon signed-rank test also shows that the proposed method is statistically better than those methods.
Similar content being viewed by others
1 Introduction
Radial basis function network (RBFN), a popular artificial neural network (ANN), has been used in many applications [1, 2]. However, like other neural network models, it is not free from input perturbation. It may encounter input perturbation when applied to real life applications, e.g., the inputs, which come from electronic sensors, such as microphones, termopars, may be altered [3]. This alteration can be additive [4] or multiplicative [3]. When the perturbation of an input is proportional to its magnitude, it is called multiplicative perturbation, and when perturbation is done with an additive Gaussian noise, it is called as additive perturbation. Conventional RBFN is sensitive to changes in input [4]. It can be made robust to its input perturbation by proper selection of RBF parameter values [5, 6].
The dimensionality of the hidden layer is an important parameter to make a RBFN efficient. If the number of RBF units of the hidden layer is not sufficient, it may cause underfitting [7]. According to Cover’s theorem [8], the number of units in the hidden layer should be more than the number of features of the input pattern to avoid underfitting. So, we have used \(n \times C\) number of RBF units in the hidden layer, where n and C are, respectively, number of features and number of classes present in the data set. It makes the number of RBF units C times more than the number of features. The choice of \(n \times C\) for the number of RBF units will make the distribution of the data representing a class uniform among the RBF units.
In this article, we have proposed a RBFN which is robust with respect to the alteration of input. Here, self-organizing map (SOM) network-based clustering algorithm has been used to select the centers of the RBF units in the hidden layer of RBFN. A SOM network has been trained using the samples belonging to a particular class of the data set and C number of such SOM networks have been trained separately for C classes. For training a SOM network, independent random Gaussian noise has been injected in the input samples of the corresponding class. We consider n number of units in the output layer of SOM which generates n weight vectors from each of the trained SOM networks. These weight vectors are used as n cluster centers for a particular class. So, for each class, we have a set of n centers corresponding to n RBF units. We select all the cluster centers corresponding to C classes. Thus, for C classes, the number of RBF units in the hidden layer is \(n \times C\). For selection of widths of the RBF units, p-nearest neighbor (p-NN) algorithm has been modified and used class-wise. Finally, pseudo inverse method has been used to select the set of weight vectors between hidden and output layers of RBFN. The proposed method has been tested for robustness on ten standard classification data sets. The experimental results have established the significance of the proposed method to make the network robust in terms of multiplicative and additive input noises.
The remaining of this article is organized as follows. In Sect. 2, RBFN and SOM network are described in brief. Some of the works existing in the literature associated with robustness and generalization ability of the RBFN are reviewed in Sect. 3. In Sect. 4, we illustrate the proposed method. The philosophy behind the proposed method is also discussed in this section. The simulation results and comparison with other existing methods are provided in Sect. 5. Finally, we draw the conclusions in Sect. 6.
2 The preliminaries
2.1 Fundamental of RBF network
A generic model of a RBFN with three layers is presented here. We consider a data set \({\mathcal {D}} = \{({\mathbf {x}}^p,{\mathbf {d}}^p)\}^{P}_{p=1}\), where the input pattern \({\mathbf {x}}^p \in \mathbb {R}^n\) and the corresponding desired output \({\mathbf {d}}^p \in \mathbb {R}^J\). P, n, and J are, respectively, the number of patterns present in the data set, the number of features present in a pattern, and the number of output. The RBFN consists of n linear nodes in the input layer. Each node of the hidden layer of RBFN is called RBF unit. Each RBFN unit is associated with a local receptive field and each receptive field is associated with a center and a width. The dimension of the center is same as the input dimension, i.e., n. Widths are the scaling factors for the output of the RBF units. The output layer consists of J linear output nodes. For an input pattern \({\mathbf {x}}^p = \left( x^p_{1}, x^p_{2}, \ldots , x^p_{n}\right)\), the RBFN with radial basis activation function (kernel basis function) in hidden layer produces an output \({\mathbf {z}}^p = \left( z^p_{1}, z^p_{2}, \ldots , z^p_{H}\right)\). Here, H is the number of RBF units in RBFN. A linear weighted summation of the outputs of hidden units produces the approximated output \({\mathbf {o}}^p = \left( o^p_{1}, o^p_{2}, \ldots , o^p_{J}\right)\) of RBFN. The basic architecture of the RBF network is shown in Fig. 1. The function of RBFN is described in the following three steps.
-
1.
Each input \(x^p_i\) in the input layer is scaled by the weight \(w_{ih}\) as given below in (1), where \(w_{ih}\) is the weight between ith unit of the input layer and hth RBF unit of the hidden layer.
$$\begin{aligned} x^p_{i_h} = x^p_i w_{ih} \end{aligned}$$(1)Thus, vector \({\mathbf {x}}^p_h = \left( x^p_{1_h}, x^p_{2_h}, \ldots , x^p_{n_h}\right)\) is the scaled input to the hth RBF unit.
-
2.
The output of hth RBF unit of hidden layer is given in (2).
$$\begin{aligned} z^{p}_{h} = \varphi \left( x^{p}_h, \mathbf {c}_{h}, \sigma _{h}\right) = \exp \left( -\frac{||{\mathbf {x}}^p_h - \mathbf {c}_h||^{2}}{\sigma _h^{2}}\right) \end{aligned}$$(2)Here, \(\varphi \left( \cdot \right)\) is the kernel basis function and \(||\cdot ||\) is the \(L^2\) norm of the operand vectors. \(\mathbf {c}_h\) and \(\sigma _h\) are, respectively, center and width of the hth RBF unit.
-
3.
The network output for the input pattern \({\mathbf {x}}^p\) is the sum of weighted outputs of RBF units, as follows in (3).
$$o^p_j = \sum _{h=1}^{H}\varphi ^p_h w_{hj} + w_0$$(3)Here, \(w_{hj}\) represents the weight between hth RBF unit and jth output unit, and \(w_0\) is the weight of the bias unit of the hidden layer.
2.2 Fundamental of SOM network
SOM network consists of two layers: input layer and output layer. n number of features, present in the input pattern \({\mathbf {x}}^p = \left( x^p_1, x^p_2, \ldots , x^p_{n}\right)\), are fed in the input layer of SOM network. Thus, the input layer of the SOM network consists of n linear nodes. The output layer of SOM network, in general, is a multi-dimensional lattice. In this study, we consider a one-dimensional lattice/grid of size m. Each unit in the output layer is denoted by \(u_j (j = 1, 2, \ldots , m)\). The input layer is fully connected with the output layer. The connection between the ith unit of the input layer and the jth unit of the output layer is associated with a weight \(w_{ij}\). Figure 2 illustrates the basic architecture of SOM network.
A weight vector corresponding to the jth output unit is represented by \({\mathbf {w}}_j = [w_{1j}, w_{2j}, \ldots , w_{nj}]\). A similarity measure \(\texttt {d}_j\) between the input pattern \({\mathbf {x}}^p\) and the weight vector \({\mathbf {w}}_j\) for the output unit j is usually computed as \(\texttt {d}_j = ||{\mathbf {x}}^p-{\mathbf {w}}_j||\). The output unit corresponding to the minimum among all \(\texttt {d}_j\)s values is the winner for the corresponding input pattern \({\mathbf {x}}^p\). The weights of the winner along with the weights of the units in its topological neighborhood are updated in the direction of the input pattern. The influence of this competition becomes exponentially less with distance from the winner which is at the center of the topological neighborhood. The neighborhood function is generally used as the Gaussian function, presented in (4).
Here, \(G\left( \cdot \right)\) is the topological neighborhood function, \(\hat{u}\) and uj are respectively the winning unit and jth unit of the output layer of SOM network, and \(\sigma\) is the width of the neighborhood function. The weight updating formula is given in (5).
Here \(\eta ^t\) and \({\mathbf {w}}_j^{t}\) are, respectively, the learning rate and the weight at time t. With repeated presentations of the training patterns, weight vectors tend to move toward the input patterns due to updation of weight. The weight updation of the neighbors restores the topology of the input patterns in the output layer.
3 The literature
Proper selection of the centers and widths of the basis functions associated with the RBF units plays an significant role in robustness and generalization of the RBFN. In literature, many notable works have dealt with the parameter selection of RBFN to improve the generalization [9,10,11,12,13,14,15,16,17,18,19,20,21] and robustness [3, 4, 7, 22,23,24,25,26,27,28,29] of the network. In this section, we have discussed some of such related works on RBFN in brief.
Moody et al. [9], have used k-means clusterings and p-NN heuristic to compute, respectively, the centers and widths of the receptive fields in the hidden layer. p-NN-based width selection method helps to achieve overlapping kernel functions between each of the RBF units and its neighbors. Thus, the RBF units form smooth and contiguity interpolation over the input space. They have shown that centers and widths of RBF units control the generalization abilities of the RBFN. In [10], the author optimized the RBF centers via the expectation maximization algorithm, where the centers have been initialized by the centroids obtained from clustering algorithm. In [11], Chen et al. have proposed orthogonal least square (OLS) method to obtain the optimal centers for RBF units. For the purpose, firstly the author has transformed the regressors, i.e., fixed function of training vectors into orthogonal basis vectors (OBVs). Later, by using forward regression method a subset of this OBVs were selected as centers of the RBF units. The authors, in [12], have used genetic algorithm for the selection of centers and widths of RBFN. Though their algorithm is robust with respect to local minima for global population-based search technique, it becomes slow when the search space is large. Support vectors, generated by support vector machine (SVM), have used as centers of RBF units in [13]. Simulation results have shown that their method outperforms k-means clustering-based RBFN. For RBFN, hierarchical full Bayesian approach has been proposed in [22]. In the computation of joint distribution of the RBF parameters and the number of basis functions, the authors have considered the following: (i) Markov chain Monte Carlo (MCMC) algorithm when the number of RBF units is known and (ii) reversible-jump MCMC algorithm when the number of RBF units is unknown. They have proved the convergence of the algorithms and show that those are robust against the specification of prior. K. Mao [14] has selected RBF centers based on the class separability obtained by Fisher ratio. Here, Fisher ratio has been incorporated into an orthogonal transform and a forward selection procedure to select RBF centers. To evaluate the class separability provided by each RBF units independently, they have decoupled the correlations among the outputs of RBF units. The empirical results demonstrate that their algorithm obtained a reduced set of RBF units which increases the class separation. The authors in [15] reduced the hidden layer space while preserving the major data structure of it. They have preserved the relative locations of the samples of the input space including those samples which are close to the boundaries of different classes. It is achieved by the centers of the basis functions. Fisher’s class separation criteria have been used for the selection of widths of the basis functions. Simulations results demonstrate that their method improves generalization of the RBFN thought training of the RBFN is computationally expensive due to the involvement of repeated singular value decompositions.
Webb [4] has used a noise dependent regularizer with sum of squared error function to make the function approximation robust in presence of noise in test patterns. Given pth data pattern \(x^p\), let \(E\left( x^p\right)\) be the error between the approximation \(f(x^p)\) and the desired value \(f^p\). Let the pattern \(x^p\) is perturbed by additive noise \(\aleph\). Assuming the value of \(\aleph\) small, by Taylor’s theorem the expectation of the error in presence of noise for \([\aleph ]\) = 0 is given in (6).
\(E(X^p)\) is the error in the absence of noise and \(\frac{1}{2} [\aleph ^{*}H^p\aleph ]\) is the additional error term, where \(H^p\) is the Hessian matrix with respect to the input and evaluated for the pth pattern as given below in (7).
For \([\aleph _i\aleph _j] = \sigma ^2\delta\) the additional error term is
where \(T_r\) is the matrix trace operation. Averaging over all data patterns gives the mean expected error as in (9).
Orr [16] has used a weight decay regularizer-based forward selection method along with cross validation method to select the centers of RBF units to improve the generalization of RBFN. A gradient descent-based learning method was used to fine tune centers and widths of the RBF units in [17]. Tuning was applied after evaluating the parameters by a clustering algorithm and pseudo matrix inversion. In [18], the authors have proposed a three phase learning algorithm for RBFN. In first phase centers and widths of the RBF units are determined. The weights between hidden and output layers are determined in the second phase. The third phase is the gradient descent-based error back propagation, used to fine tune all the RBF parameters. They have shown that the performance of a two phase learning algorithm of RBFN has been improved by adding another phase. In [30], for the selection of parameters of RBFN, the authors have minimized localized generalization error bound for unseen patterns. For this purpose, the objective function, given below in (10), was minimized. The objective function was used to find an upper bound of MSE for unknown patterns in q neighborhood of the training patterns.
Here \(\mathcal {R}_\mathrm{emp} = \frac{1}{P}\sum _{p=1}^{P}\left( {\mathbf {o}}^p-{\mathbf {d}}^p\right) ^2\) and \(\Delta Y_p=\left( {\mathbf {o}}-{\mathbf {o}}^p\right) ^2\). Here, \({\mathbf {o}}^p\) and \({\mathbf {d}}^p\) are, respectively, obtained and desired output of the pth pattern. In case of center and width selection, there method outperforms RBF networks trained by minimizing training error only. Based on the Gladyshev theorem, Ho et al. [27] have proved the convergence of many online training algorithms like additive noise injection in input, additive/multiplicative noise injection in weight, etc. They have shown that the objective function obtained by additive input noise injection during training time is equivalent to Tikhonov regularization-based approach.
Fritzke has used a growing cell self-organizing map in [19] for the selection of center of RBF units. Their network adds a new unit whenever an input pattern is sufficiently away from all existing RBF centers. It helps to find better positions of centers for the RBF units with respect to generalization. Another variant of SOM called Probabilistic self-organizing map has been used in [20] for the parameter selection in RBFN to reduce the root-mean-square error. In [21], the authors have used SOM-based clustering and k-means algorithm for the selection of centers of RBFN to equalize time varying nonlinear channels in a satellite UMTS channel. Empirically they have shown that the performance is better in case of SOM-based clustering. In [28], the authors performed forward selection and SOM-based clustering for the selection of centers of RBF units. Using simulations, they have shown that RBFN trained by SOM is less sensitive to the noise present in the input compared to Forward selection method. In [31], the authors have used incremental SOM to find the centers of RBF units in the hidden layer to improve the classification accuracy of the corresponding RBFN. Here, if any new samples are added in the network, the topology of the SOM network is modified by generating new cluster centers.
Initially the effect of input perturbation and weight perturbation on RBFN output is analyzed in [23]. The authors have determined output covariance matrix of an RBFN by Taylor series expansion and measured the sensitivity of the output in presence of perturbation. In [24], the authors have studied the importance of input perturbation using a wavelet adaptive neural network. The authors have estimated the sensitivity on input as the ratio between the standard deviation of the prediction and the altered input and shown that the sensitivity is highly correlated with the input perturbation. Lee et al., in [25], have proposed a RBFN, which is robust with respect to those functions which have constant value over a interval of time. They have shown that the proposed RBFN is also robust against the outliers. The activation function in their RBFN is a constant valued function which is composed of a set of sigmoid functions. The new activation function was used with a robust objective function, given below in (11).
Here \(e_p = t_p - f(x_p)\) is the error for the pth training sample with desired output \(t_p\), \(\phi (e_p)\) is a continuous function, \(\phi (0)\) is constant, and P is the total number of input samples. Note that, \(E_R(e_p)\) is the least square criterion when \(\phi (e_p)=e^2\) and \(\phi (0)=0\). Bruzzone et al. [26] have used k-means algorithm for class-wise selection of centers of the receptive fields. For selection of widths of the receptive fields they have used p-NN algorithm only where the number of centers belong to the class is more than p, otherwise standard deviation, computed over all training samples belonging to that class, is used. Bernier et al. [3] have studied the fault tolerance behavior of the RBF network for both additive and multiplicative input perturbation. They [3] have proposed a measure for sensitivity, known as Mean square sensitivity (MSS), which is defined as follows in (12).
Here \(SS_i^m\) is sensitivity of the ith unit in the mth layer of the network. P is the number of input patterns and \(N_m\) is the number of neurons in mth layer of RBFN. Using Taylor expansion, they have also defined MSS for additive and multiplicative input noises \((\text {MSS}_\mathrm{in})\) as given below in (13).
Here, \(\varepsilon (p)\) is instantaneous squared error for the pth input pattern and \(x^{p}_{k}\) is the kth feature of the pth input pattern. Low value of MSS indicates less sensitivity of the network against input perturbation. In [29], the authors have proposed an algorithm for sensitivity analysis of RBFN. The centers and the widths of the RBF units are determined by maximizing the output sensitivity of the training pattern. The least number of such hidden RBF units with the maximal sensitivity represents the most generalized RBF network. The authors have defined the output deviation of \(\Delta y_j\) as in (14).
Here, \(z_i\) is the response of the ith hidden unit. \(\hat{c}_i\) and \(\hat{\sigma }_i\) are, respectively, the center and the width of the ith hidden unit in presence of noise. \(\hat{c}_i = c_i + \Delta c_i\) is the altered center perturbed by \(\Delta c_i\) and the interconnection weight under perturbation \(\Delta w_{ij}\) is \(\hat{w}_j = w_j + \Delta w_j\). Here \(w_j\) has been computed using a pseudo matrix inversion. Robustness of RBFNs with respect to noise in the input is analyzed in [6]. They have determined the upper bounds on the MSE for noisy inputs and network parameters when network parameters are constrained. This parameter constrains make the RBFN robust to noise. Moreover, they have presented a technique to identify high sensitive parameters and inputs in the network. Yo et al. [7] have presented a review on different approaches of designing and training RBFN. They have also proposed a new approach called ErrCor algorithm to find proper size of RBFN and initialization of the centers of RBF units. They have tested the robustness of the algorithm by injecting noise in the test pattern. Simulation results show that it improves robustness of the network.
In [32], the authors have modeled a new gradient-based sequential RBFN. Here, they initially developed a meta-model of RBFN with a small training set. Then, they compute the gradient of the current training set and add the points with maximum gradient to the current training set and rebuild the model. Those new points, added to the training set, help to get the information of highly nonlinear regions which inconsequence improves accuracy in each sequential step. Han et al., in [33], have proposed a variant of RBFN called self-organizing RBF (SORBF) neural network, where the network structure and network parameters both are variable. They used growing and pruning method simultaneously to fix the number of RBF nodes and network weights. They have added or removed a node based on the distance from the radius of the receptive field. Ding et al., in [34], have used genetic algorithm to optimize RBF network and weights. They have used binary encoding for network structure and real encoding for weights between hidden and output layers. Further, they have fine-tuned the weights by pseudo inverse method/least mean square method. Simulations results have shown that their method improve the generalization of RBFN. The author in [35] have proposed a new RBFN called restricted radial basis function network (rRBF) to visualize the patterns in low-dimensional space and perform the classification task. The low-dimensional internal representation of rRBF helps to visualize the structure of the data set. Consequently, it helps to improve the generalization error of the classifier by providing more training data to the region where the classifier lacks visualization of internal representation. They have empirically demonstrated the dimensionality reduction and classification ability of rRBF network.
4 Proposed work
4.1 Mathematical model
RBFN is a local approximation network, where each RBF unit of the hidden layer acts as the local receptive field and the network output is determined by the local receptive fields. Proper selection of the centers and the widths of the kernel basis functions associated with the RBF units is important for making the RBFN robust. If the cluster centers lie in the overlapping regions of different classes, local receptive fields will fail to approximate the class boundaries. Again in case of width selection, we can say that if the width of the basis function is larger than the width of the corresponding class region, it may fail to perform the classification task. In this paper, our prime objective is to find a set of centers and widths for RBF units to make the RBFN robust with respect to input perturbation. The proposed approach is explained below.
We have considered an input output data set \({\mathcal {D}} =\{({\mathbf {x}}_p,{\mathbf {d}}_p)\}^{P}_{p=1}\) for RBFN and partitioned it class-wise. Here, a set of SOM networks has been trained by the samples of the data set. Samples of a particular class have been used to train a particular SOM network. Thus the number of SOM networks is the same as the number of classes present in the data set and the number of units in the output layer of SOM network in the proposed method is also the same as the number of features of the data set. So, output layer of SOM is represented by a one-dimensional lattice of size n, where n is the number of features present in the input sample.
Let \({\mathbf {a}}^1\), \({\mathbf {a}}^2\), \(\ldots\), \({\mathbf {a}}^P\,\,\in \,\,R^n\) be P random Gaussian vectors. Before learning the SOM network, the random vectors (noise) are added with the corresponding input patterns \({\mathbf {X}}= ({\mathbf {x}}^1\), \({\mathbf {x}}^2\), \(\ldots\), \({\mathbf {x}}^P)\). Random variables in each random vector \({\mathbf {a}}^i = (a^i_1, a^i_2, \ldots , a^i_n)\) are independent and follow zero mean Gaussian distribution with variance v. The probability distribution of noise variance v is given below in (15).
Here, \({\mathbf {a}}^i\) and \({\mathbf {a}}^j\) are independent for all \(i \ne j\). So the perturbed input of pattern p can be represented by a vector as shown in (16) and (17) corresponding to additive and multiplicative noises, respectively.
Then the SOM network is trained using the perturbed input patterns \(\mathbf {\bar{X}}\). Let a unit i in input layer is connected to a unit j in the output layer by a weight \(w_{ij}\) and \({\mathbf {w}}_j = [w_{1j}, w_{2j}, \ldots , w_{nj}]\) represents the weight vector corresponding to the jth unit in output layer. We take the similarity measure \(\text {d}_j\) between the perturbed input \({\mathbf {{\bar{x}}}}^p\) and the weight vector \({\mathbf {w}}_j\) for each unit of the output layer using \(L^2\) norm. If the output unit \(\hat{u}\) wins the competition then we update only the weights of the winning unit along with the weights of the units in its topological neighborhood in the direction of the input pattern. Weight updation, defined earlier in (5), has now been changed to (18) due to perturbation.
Here \(\eta ^t\) and \({\mathbf {w}}_j^{t}\) are, respectively, the learning rate and the weights at time t. The neighborhood function \(G\left( \cdot \right)\), used in (18), is defined earlier in (4). During first \(80\%\) of the training time, we change the learning rate \(\eta ^t\) with time as given below in (19)
where \(\eta ^0\) is the initial learning rate and \(\tau\) is set to half of the training time. For the remaining \(20\%\) of the training time, the learning rate \(\eta\) is fixed at 0.05.
The weight vectors from the trained SOM networks are used as the cluster centers for the units of the hidden layer. Each of the SOM networks has same number of weight vectors. It makes the total number of RBF units in the hidden layer same as the number of classes multiplied by the number of features present in the data set. It ensures that each class has equal number of RBF units in the hidden layer.
To compute the values of widths of the RBF units, we have modified p-NN algorithm. In the original p-NN algorithm, p number of neighbors are considered irrespective of their class membership. In the modified algorithm, p is the number of closest neighbors belonging to the same class among n closest neighbors of the corresponding RBF unit are considered. Here, n is the number of centers present in each class. The width is considered as the distance among p-nearest cluster centers scaled by the number of centers present in that class. Let \(\sigma _{h}\) be the width for the hth RBF unit, then (20) gives the value of the width \(\sigma _{h}\).
Here, \(c_i\) and \(c_h\) are, respectively, the centers of the ith and hth RBF units. Once the centers and widths of the RBF units are determined, the weight matrix \({\mathbf {W}}\) between the hidden and output layers of RBFN is obtained by the pseudo inverse method as given below in (21).
Here, \({\mathbf {H}}\) is the hidden layer output matrix and \(^+\) denotes the pseudo inverse operator. \({\mathbf {D}}\) is desired outputs corresponding to \({\mathbf {X}}\) input patterns. Figure 3 illustrates the basic architecture of the proposed method.
Different functions of the proposed algorithm corresponding to different layers are given below.
Input layer:
The computation at the ith neuron of this layer is
Hidden layer:
The computation at the hth neuron of this layer is
Output layer:
The computation at the jth neuron of this layer is
Here \(P_c\) is the number of patterns available in each class. \({\mathcal {S}}, {\mathcal {G}}\), and \({\mathcal {L}}\) are, respectively, the activation functions of the input, hidden, and output layers. Here, weights of the SOM networks have been trained with noisy input which has been used as the centers of RBF units. To make the widths of the RBF units which are close to class boundaries small, we have scaled it by a factor of n. It makes the width less sensitive to the class separation boundaries.
4.2 Input perturbation scheme for training
During training, For choosing the standard deviation of the perturbation vector (noise) judiciously, we have considered a set of three standard deviations \(\Sigma_1 = \{0.05, 0.1, 0.5\}\) for experimentation. Then, we have performed tenfold cross validation on Sonar data set considering those three values of standard deviations and evaluated the performance of the proposed method. We have observed that the proposed method performs best for the standard deviation \(\sigma = 0.1\). Thus, we have selected the perturbation vector with zero mean and standard deviation \(\sigma =0.1\).
4.3 Philosophy behind the proposed work
Centers of RBF units, if selected using all the training samples together, might give rise to mixed kernel functions. It will generate the clusters mixed and a cluster may not be associated only with the samples of a particular class. It reduces the separability between the classes in the kernel space. p-NN is used for the selection of widths of the RBF units. If the kernel functions are located in the boundary regions between classes, overlapped cluster centers will affect the selection of widths of RBF units in p-NN-based procedure [26]. In this study, cluster centers of the RBF units are generated by SOM networks using the training patterns belonging to the same class. It avoids the generation of mixed kernel functions.
If the clustering is not done class-wise [26], kernel functions may overlap. In that case, use of p-NN for the selection of the widths of RBF units is not a good approach. In this study, we have proposed modified p-NN to keep the width small for the kernel functions located in boundary regions between classes and large for the kernel functions located at the center of the class regions. It avoids overlapping of kernel functions.
Noise injection during clustering by SOM network is a novel approach for the selection of centers of RBF units. Noise injection in the input pattern is equivalent to that of Tikhonov regularizer approach [36], which in turn reduces the wiggling effect [37] in the hyper-spheres of local receptive fields of RBF units. It makes the curvature of the hyper-spheres of the local receptive fields smooth. Due to smoothness of the hyper-spheres, small alteration in the input samples will not make the deviation large in the output. Thus the network will be less sensitive with respect to input noise and outliers present in the input data set, which in turn makes the RBFN robust.
5 Experimentation
5.1 Experimental design
The results, presented in this section, are based on ten standard classification data sets, taken from [38]. The information of all the data sets and their class-wise distribution are shown in Table 1.
To compare the robustness of the proposed method, we have compared it with three different center selection methods existing in the literature: (i) Method I uses class-wise random data point selection [1], (ii) Method II uses class-wise k-means clustering [26] and (iii) Method III uses class-wise SOM-based clustering [28]. For all these three existing methods, we have considered original p-NN-based width selection method. Note that, in our model, we have proposed two methods, one for the center selection and another for width selection. To analyze the performance of only the proposed p-NN-based modified width selection method, we have considered Method IV, where the proposed center selection method and the original p-NN-based width selection method have been used.
In all these five methods including the proposed one, the number of RBF units in the hidden layer is considered same. The simulations are carried out in MATLAB (Version 8.1). We have first applied z-score normalization on the training data set and found the mean and standard deviation, which have been used to perform the z-score normalization on the test data set. Wilcoxon signed-rank test has been used to compare the statistical significance of each pair of algorithms.
5.1.1 Parameters settings
In RBFN, weight vectors between the hidden and output layers are initialized within the range \([-0.5\quad 0.5]\). For learning the SOM network, we have initialized the learning rate \(\eta\) to 0.9. During the first \(80\%\) of the training time, the learning rate \(\eta\) has been reduced uniformly from 0.9 to 0.05 using (19). The remaining \(20\%\) of the training time, the learning rate \(\eta\) is fixed at 0.05. The number of weight updates is 500 times more than the number of samples present in the data set. We have performed tenfold cross validation ten times for each data set and consider their mean as the classification accuracy of the data set.
5.1.2 Input perturbation scheme for testing
To mimic the scenarios of (i) additive noise and (ii) multiplicative noise, discussed earlier in Sect. 4.1, we have altered the test inputs using a Gaussian noise with mean zero. For comparison purpose, we have used five distinct values of standard deviation \((\sigma )\) of the Gaussian noise to represent five different noise conditions. The set of five different values of standard deviation is \(\Sigma_2 = \{0.0, 0.1, 0.2, 0.3, 0.4\}\). Note that when standard deviation \(\sigma =0.0\), the input is unaltered, representing the scenario without noise.
5.2 Experimental results and analysis
The detailed results of the experiment are provided in ten tables, Tables 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11 for ten data sets, mentioned earlier. Five tables, Tables 2, 3, 4, 5 and 6 show the classification accuracy for five different values of standard deviation of additive noise. For multiplicative noise, the classification accuracy is shown in other five tables, Tables 7, 8, 9 10 and 11 for same noise conditions. The boldfaced entries in the tables indicate the best performance in its corresponding noise condition and the bottom row in each table indicates the number of times a method wins over all other methods. During the analysis of the results in different noise scenarios, we observe few things as mentioned below.
In case of \(\sigma =0.0\), i.e., for scenario without noise, Tables 2 and 7 show that the performance of the proposed method is better than all other methods: Method I, Method II, Method III and Method IV. However, statistical significance test shows that the proposed is not significantly better than the method III. The proposed method with the comparing Method IV, we observed that the proposed width selection mechanism, i.e., modifier p-NN is better than the original p-NN-based width selection, which is also supported by statistical test. Hence, we can conclude that the overall performance of the proposed method is better than all other methods.
In case of additive noise scenario, when the noise variance \(\sigma =0.1\), the proposed method performs best for six out of ten data sets. When the noise variance \(\sigma =0.2\) or \(\sigma =0.3\), the proposed method performs best for eight data sets. For \(\sigma =0.4\), the performance of the proposed method is best for nine data sets except the iris data set. All the classes of iris data set are nearly well separated. It may be reason behind the result and the good performance of the Method IV.
For multiplicative noise scenario, when the noise variance \(\sigma =0.1\), or \(\sigma =0.2\), the performance of the proposed method is better than all other methods, whereas the performance of Method II, Method III and Method IV is similar. Proposed method performs best for eight and nine data sets when the noise variance is \(\sigma =0.3\) and \(\sigma =0.4\), respectively. Method IV performs best for iris data set in both the cases.
Finally, from all the tables, we can conclude that in both the additive and multiplicative noise scenarios, the proposed method performs best with the increase of the values of the noise variance.
5.2.1 Testing with outliers
To test the performance of the proposed method against the outliers, it has been tested when outliers are present in the data set. For this purpose, we have randomly changed the class membership of \(5\%\) input patterns to any of the other classes. This input patterns will act as the outliers in the data set. With the modified data set including the outliers, we have computed the classification accuracy for all five methods including the proposed methods. The simulation results are shown in Table 12. The boldfaced entries in the table indicate the best performance and the bottom row indicate the number of times a method wins over all other methods. It shows that the Method IV performs best for four data sets and proposed method performs best for six data sets among all five methods. As the results are similar whether we use original p-NN or modified p-NN for the selection of widths along with the proposed center selection method, we can state that the robustness of the proposed method with respect to outliers primarily depends on the proposed center selection method.
5.3 Statistical significance: Wilcoxon signed-rank test
For comparison purpose, Wilcoxon signed-rank test is applied for each pair of four methods using the results reported in Tables 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11. The results of Wilcoxon signed-rank test is summarized in Tables 13 and 14, respectively, for additive and multiplicative noises. In each of the tables, the notation i:j labeled in a column represents the statistical significance between Method i and Method j. The symbols \(\prec\), \(\approx\), \(\succ\), used in the table, respectively, indicate that with respect to the performance, Method i is significantly superior than Method j, no significant difference is there between Method i and Method j, and Method i is significantly inferior than Method j. For the proposed method, \(j=5\). In both cases of additive and multiplicative noises, the results of statistical significance tests support our observation, discussed earlier in the previous subsection. From Tables 13 and 14, we observed that that the proposed center section method with original p-NN-based width selection method, i.e., Method IV, works significantly better than Method I, Method II, and Method III as we increase the noise variance (e.g., \(\sigma =0.4\)). Further, The modified p-NN-based width selection method with the proposed center selection method, i.e., the proposed method, improves the robustness of the model even with the low variance noises. Thus, from these statistical tests, we conclude that the combination of both proposed center and width selection methods performs best compared to other methods with respect to input perturbations. Note that, the last columns in Tables 13 and 14 show the importance of the combination of the proposed center and width selection methods.
Wilcoxon signed-rank test has also been applied for data sets with outliers, and the test results are shown in Table 15. Table 15 also confirms the superiority of the proposed method with respect to other methods.
6 Conclusions
In this article, we have proposed an approach to make RBFN robust with respect to additive and multiplicative input perturbations. For this purpose, a SOM network has been trained separately by the samples of a particular class present in the data set. Random Gaussian noise has been injected in the input patterns to train all of the SOM networks. Later, the weight vectors of each SOM network are used as the centers of that particular class in the hidden layer units of RBFN. We have also determined the width of the RBF unit as the scaled distance between the center of the RBF unit and the p-nearest centers of the RBF units belonging to the same the class. This process makes the RBF network robust as outputs are determined by specified hidden units only. As noise is considered while training the RBF unit, the hyper-spheres of each RBF units become smooth making RBF network robust. More smooth is the surface of the hyper sphere, more immune is the corresponding RBF units to noise. Further, the modified p-NN-based width selection technique reduces the sensitivity of the RBF units with respect to the inter-class separation boundaries. The simulation results have empirically established the superiority of the proposed approach to attain robust RBF networks in terms of additive and multiplicative input noises. We have empirically validated that the proposed approach is also robust against outliers present in the data set.
In this network, the performance is also depends on the selection of weights between hidden layer and output layer of RBF network. In future, we intend to modify the learning mechanism for selection of weights between the hidden layer and the output layer to make it more robust with respect to additive and multiplicative noises in weights.
References
Lowe D (1988) Multi-variable functional interpolation and adaptive networks. Complex Syst 2:321–355
Saha A, Wu CL, Tang DS (1993) Approximation, dimension reduction, and nonconvex optimization using linear superpositions of gaussians. IEEE Trans Comput 42(10):1222–1233
Bernier JL, Díaz AF, Fernández F, Cañas A, González J, Martin-Smith P, Ortega J (2003) Assessing the noise immunity and generalization of radial basis function networks. Neural Process Lett 18(1):35–48
Webb AR (1994) Functional approximation by feed-forward networks: a least-squares approach to generalization. IEEE Trans Neural Netw 5(3):363–371
Haykin S, Network N (2004) A comprehensive foundation. Neural Netw 2:2004
Eickhoff R, Rückert U (2007) Robustness of radial basis functions. Neurocomputing 70(16):2758–2767
Yu H, Xie T, Paszczynski S, Wilamowski BM (2011) Advantages of radial basis function networks for dynamic system design. IEEE Trans Ind Electron 58(12):5438–5450
Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput 3:326–334
Moody J, Darken CJ (1989) Fast learning in networks of locally-tuned processing units. Neural Comput 1(2):281–294
Bishop C (1991) Improving the generalization properties of radial basis function neural networks. Neural Comput 3(4):579–588
Chen S, Cowan CF, Grant PM (1991) Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans Neural Netw 2(2):302–309
Whitehead BA, Choate TD (1996) Cooperative-competitive genetic evolution of radial basis function centers and widths for time series prediction. IEEE Trans Neural Netw 7(4):869–880
Schölkopf B, Sung KK, Burges CJ, Girosi F, Niyogi P, Poggio T, Vapnik V (1997) Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE Trans Signal Process 45(11):2758–2765
Mao K (2002) Rbf neural network center selection based on fisher ratio class separability measure. IEEE Trans Neural Netw 13(5):1211–1217
Mao KZ, Huang GB (2005) Neuron selection for RBF neural network classifier based on data structure preserving criterion. IEEE Trans Neural Netw 16(6):1531–1540
Orr MJ (1995) Regularization in the selection of radial basis function centers. Neural Comput 7(3):606–623
Cohen S, Intrator N (2000) Global optimization of RBF networks. http://www.cs.tau.ac.il/~nin/papers/rbf.pdf (cf. p. 156)
Schwenker F, Kestler HA, Palm G (2001) Three learning phases for radial-basis-function networks. Neural Netw 14(4):439–458
Fritzke B (1994) Growing cell structures a self-organizing network for unsupervised and supervised learning. Neural Netw 7(9):1441–1460
Anouar F, Badran F, Thiria S (1998) Probabilistic self-organizing map and radial basis function networks. Neurocomputing 20(1):83–96
Bouchired S, Ibnkahla M, Roviras D, Castanié F (1998) Equalization of satellite mobile communication channels using combined self-organizing maps and RBF networks. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998, vol 6. IEEE, pp 3377–3379
Andrieu C, De Freitas N, Doucet A (2001) Robust full bayesian learning for radial basis networks. Neural Comput 13(10):2359–2407
Townsend NW, Tarassenko L (1999) Estimations of error bounds for neural-network function approximators. IEEE Trans Neural Netw 10(2):217–230
Ikonomopoulos A, Endou A (1998) Wavelet decomposition and radial basis function networks for system monitoring. IEEE Trans Nuclear Sci 45(5):2293–2301
Lee CC, Chung PC, Tsai JR, Chang CI (1999) Robust radial basis function neural networks. IEEE Trans Syst Man Cybern B Cybern 29(6):674–685
Bruzzone L, Prieto DF (1999) A technique for the selection of kernel-function parameters in rbf neural networks for classification of remote-sensing images. IEEE Trans Geosci Remote Sens 37(2):1179–1184
Ho KI, Leung CS, Sum J (2010) Convergence and objective functions of some fault/noise-injection-based online learning algorithms for rbf networks. IEEE Trans Neural Netw 21(6):938–947
Tinós R, Terra MH (2001) Fault detection and isolation in robotic manipulators using a multilayer perceptron and a rbf network trained by the Kohonen’s self-organizing map. Rev Soc Bras Autom Contr Autom 12(1):11–18
Shi D, Yeung DS, Gao J (2005) Sensitivity analysis applied to the construction of radial basis function networks. Neural Netw 18(7):951–957
Yeung DS, Chan PP, Ng WW (2009) Radial basis function network learning using localized generalization error bound. Inf Sci 179(19):3199–3217
Tu S, Ben K, Tian L, Zhang L (2008) Combination of SOM and RBF based on incremental learning for acoustic fault identification of underwater vehicles. In: Congress on image and signal processing, 2008 (CISP’08), vol 4. IEEE, pp 38–42
Yao W, Chen X, Luo W (2009) A gradient-based sequential radial basis function neural network modeling method. Neural Comput Appl 18(5):477–484
Han H, Chen Q, Qiao J (2010) Research on an online self-organizing radial basis function neural network. Neural Comput Appl 19(5):667–676
Ding S, Xu L, Su C, Jin F (2012) An optimizing method of rbf neural network based on genetic algorithm. Neural Comput Appl 21(2):333–336
Hartono P (2016) Classification and dimensional reduction using restricted radial basis function networks. Neural Comput Appl. doi:10.1007/s00521-016-2726-5
Bishop CM (1995) Training with noise is equivalent to Tikhonov regularization. Neural Comput 7(1):108–116
Wahba G (1990) Spline models for observational data, vol 59. SIAM, Philadelphia
Frank A, Asuncion A (2010) UCI machine learning repository, vol 213. University of California, Irvine
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We have tried our best to minimize the overlap between this manuscript and published articles for fragments of sentences and technical terms. We do not have any conflict of interest with others.
Rights and permissions
About this article
Cite this article
Dey, P., Gopal, M., Pradhan, P. et al. On robustness of radial basis function network with input perturbation. Neural Comput & Applic 31, 523–537 (2019). https://doi.org/10.1007/s00521-017-3086-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-3086-5