An oblique elliptical basis function network approach for supervised learning applications
Graphical abstract
Introduction
Supervised learning, including classification and regression, plays an important role in many fields [1], [2], [3], [4], [5], [6], [7], [8], [9]. Two kinds of classification tasks are encountered. One is single-label classification in which each instance belongs to only one category. The other one is multi-label classification in which an instance is allowed to belong to more than one category. For regression, one is usually interested in estimating the numerical relationship between a dependent variable and one or more independent variables.
Many methods, based on machine learning techniques such as k-nearest neighbors (KNN) [10], multilayered perceptrons (MLP) [11], and radial basis function (RBF) neural networks [12], have been proposed for single-label classification. KNN is a type of lazy learning. For an unseen pattern, its k nearest neighbors in the feature space are found. The unseen pattern is classified by being assigned to the most common class among these k nearest neighbors. KNN is simple, but no training is done and all computation is deferred until classification. A MLP is a feedforward artificial neural network model consisting of multiple layers of nodes, with each layer fully connected to the next one. Each node in the hidden layers is a neuron with a nonlinear activation function. MLP usually utilizes a supervised learning technique, called backpropagation, for training the network. A RBF network uses radial basis functions as activation functions. The output of the network is a linear combination of the outputs from radial basis functions. In [13], basis functions are interpreted as probability density functions. The weights are seen as prior probabilities. Models that output class conditional densities or mixture densities are proposed. A training algorithm based on the expectation-maximization (EM) algorithm [14], is developed. A hybrid training of this type of RBF networks is addressed in [15]. Some extensions of RBF neural networks, e.g., elliptical basis function (EBF) [16], [17] and versatile EBF (VEBF) [18] neural networks, have also been proposed. Fast training algorithms are used to learn a data set in only one pass. The network structure is flexible and can be adjusted during the training process. However, the Euclidean or Mahalanobis distance is used, and the learning of output weights is not taken into consideration.
To deal with multi-label text classification, two approaches are mainly adopted [19], [20]. One approach transforms a multi-label classification task into several single-label classification tasks to which single-label classification methods can be applied [21], [22]. The other extends the capability of a specific single-label classification algorithm to handle multi-label data directly. One popular problem transformation method, called binary relevance, was proposed in [21]. The method transforms the original dataset into p datasets, where p is the number of categories associated with the original dataset. Each resulting dataset contains all instances of the original dataset with only two labels, “belonging to” or “not belonging to” a particular category. Since the resulting datasets are single-labeled, all single-label classification techniques are applicable to them. Several neural-network approaches adaptively designed for multi-label classification have been proposed [27], [19], [24], [25], [26], [23]. A kernel method proposed for multi-label classification is proposed in [21]. The back-propagation multi-label learning (BP-MLL) [28] algorithm is a multi-label version of the back-propagation neural network. To take care of multi-labels, label co-occurrence is incorporated into the pairwise ranking loss function. However, it has a complex global error function to be minimized. The multi-label k-nearest neighbors (ML-KNN) [29] is a lazy learning algorithm which requires a big run-time search. ML-RBF [30] is a multi-label RBF neural network which is an extension of the traditional RBF learning algorithm. Multi-label with fuzzy relevance clustering (ML-FRC) [31] is a fuzzy relevance clustering based method for multi-label text categorization. Nam et al. [32] report that binary cross entropy can outperform the pairwise ranking loss by leveraging rectified linear units for nonlinearity. Kurata et al. [33] propose a neural network initialization method to treat some of the neurons in the final hidden layer as dedicated neurons for each pattern of label co-occurrence. These dedicated neurons are initialized to connect to the corresponding co-occurring labels with stronger weights than to others.
Various approaches based on KNN, MLP, and RBF neural networks have been proposed for regression problems [10], [11], [12]. The KNN regression method is a nonparametric method which bases its prediction on the k nearest neighbors of the target to be forecasted. However, it lacks the capability of adaptation and the testing time is long. For the MLP and RBF networks used in regression, the outputs are numerical values instead of categories. Network parameters are learned from the training data. However, determining the number of hidden nodes is a challenging issue. Fuzzy theory has been incorporated for prediction [34]. However, membership functions need to be determined which is often a challenging task. Also, no learning is offered by fuzzy theory. A neuro-fuzzy scheme which combines the fuzzy theory and a hybrid learning method is proposed for regression problems [35]. A set of fuzzy IF-THEN rules is extracted from the training examples. Then a fuzzy neural network is constructed accordingly and parameters are refined to increase the precision of the fuzzy rule-base. Extreme learning machines (ELMs) [36], [37] have also been proposed to deal with regression problems. ELM is a single-hidden layer feedforward network. The input weights connecting the input layer to the hidden layer, as well as the biases of the hidden neurons, are assigned to random values and no training is done for them. The output weights, connecting the hidden layer and the output layer, are determined by learning from training patterns.
In this paper, we present a neural network architecture based on the oblique elliptical basis function for classification and regression problems. In classification, a category can involve several overlapping or disjoint sub-regions. We decompose the feature space into a collection of sub-regions which can then be combined to make up individual categories. An unseen instance is classified to a certain category if its similarity to the category exceeds a threshold. The whole framework fits in a five-layer network consisting of input, component-similarity, cluster-similarity, aggregation, and output layers. A similar idea also applies to solving regression problems. Three steps are proposed for training such networks in the training phase: network initialization, parameter learning, and output activation. Firstly, a group of clusters is located for the given training data by a rotational self-constructing clustering algorithm. Secondly, a parameter learning algorithm based on least squares estimation is used to set the weights of the network. Finally, transfer functions are applied to derive the output response of the network. Other existing supervised learning methods may have the restriction that only allows decision regions to be convex. For instance, the decision regions KNN [10] provides are convex. For a given input, the decision region can be regarded as the surrounding convex space containing its k nearest neighbors. The perceptron algorithm [38], [39] is another example where the decision boundary separates the decision regions which are convex. Support vector machines (SVM) [40], [41] are yet another example. In a SVM model, a hyperplane is derived to divide the space into two convex decision regions. We have added these comments in this revision. However, our proposed method allows the decision region associated with an output node to be disconnected or non-convex.
This work differs from [35], [31] in that the clustering algorithm allows the given data to be grouped into clusters expressed in the form of oblique elliptical basis functions. In [35], [31], standard elliptical basis functions are used. No rotations are applied to the basis function and its axes are parallel to the coordinates. In this work, rotations are allowed and the axes may not be parallel to the coordinates. As a result, data can be clustered more appropriately by oblique elliptical basis functions than by standard elliptical basis functions. Consider a simple example in Fig. 1(a) containing 9 two-dimensional data points. A cluster with standard elliptical basis function embracing these points is shown by the dotted curve in Fig. 1(b), while a cluster with oblique elliptical basis function for the points is shown in Fig. 1(c).
The cluster of Fig. 1(c) is more fit to the 9 data points than the cluster of Fig. 1(b), covering less extraneous space. Moreover, the work in [35], [31] can only apply to either regression or classification problems, while the approach proposed in this paper is applicable to single-label classification, multi-label classification, as well as regression problems.
The rest of this paper is organized as follows. The use of neural networks for supervised learning is addressed in Section 2. Our proposed neural network approach is described in Section 3. A simple example for illustration is given in Section 4. Experimental results are presented in Section 5. Finally, concluding remarks are given in Section 6.
Section snippets
Supervised learning with neural networks
Let X = {(x(i), y(i))|1 ≤ i ≤ N} be a finite set of N training instances, where is an input vector with s features, i.e., , and y(i) is the corresponding target vector with p components, i.e., , of the ith instance. Two cases are concerned:
- •
For classification problems, y(i) is a category vector defined as follows:
Proposed approach
In this section, we firstly present the neural network architecture for supervised learning. The function of each layer is described. Then the learning for the network parameters is developed. These parameters include the number of hidden nodes, the associated means and deviations, the output weights, as well as the output thresholds.
Our proposed network architecture is shown in Fig. 2 which consists of five layers: input layer (Layer 1), component-similarity layer (Layer 2), cluster-similarity
Example of a network training
We give an example of single-label classification to illustrate how the proposed idea works. Suppose we have the following training instances:(x(1) = 〈0.55, 0.72〉T, y(1) = 〈1, 0〉T), (x(2) = 〈0.30, 0.60〉T, y(2) = 〈1, 0〉T), (x(3) = 〈0.70, 0.35〉T, y(3) = 〈0, 1〉T), (x(4) = 〈0.50, 0.52〉T, y(4) = 〈1, 0〉T), (x(5) = 〈0.58, 0.38〉T, y(5) = 〈0, 1〉T), (x(6) = 〈0.49, 0.40〉T, y(6) = 〈0, 1〉T), (x(7) = 〈0.78, 0.20〉T, y(7) = 〈0, 1〉T), (x(8) = 〈0.62, 0.25〉T, y(8) = 〈0, 1〉T), (x(9) = 〈0.40, 0.65〉T, y(9) = 〈1, 0〉T), (x(10) = 〈0.35, 0.38〉T, y(10) = 〈1, 0〉T), (x(11)
Experimental results
In this section, we present experimental results to show the effectiveness of our proposed approach. Comparisons on testing accuracy and training time with other methods are also presented. The results for single-label classification are shown, followed by those for multi-label classification and regression, respectively.
For convenience, we name our approach oblique elliptic basis function network, abbreviated as OEBFN. We use a computer with Intel(R) Core(TM) i7 [email protected] GHz and 16 GB of RAM to
Concluding remarks
We have presented a five-layer neural network architecture based on the oblique elliptical basis function and an approach for setting up such networks for supervised learning applications. From a given set of training data, a group of clusters is located by a rotational self-constructing clustering algorithm. Through the incorporation of rotation, data in oblique elliptical distributions can be represented more appropriately than by standard elliptical basis functions. A parameter learning
Acknowledgments
This work was supported by the Ministry of Science and Technology under the grant MOST-103-2221-E-110-047-MY2, and by “Aim for the Top University Plan” of the National Sun Yat-Sen University and Ministry of Education. The authors are grateful to the anonymous reviewers for their constructive comments which greatly helped improving the quality of this paper.
References (49)
- et al.
A cloud based architecture capable of perceiving and predicting multiple vessel behaviour
Appl. Soft Comput.
(2015) - et al.
An extensive experimental comparison of methods for multi-label learning
Pattern Recognit.
(2012) - et al.
Learning multi-label scene classification
Pattern Recognit.
(2004) - et al.
A lazy learning approach to multi-label learning
Pattern Recognit.
(2007) - et al.
Online fuzzy time series analysis based on entropy discretization and a fast Fourier transform
Appl. Soft Comput.
(2014) - et al.
Prediction of noisy chaotic time series using an optimal radial basis function neural network
IEEE Trans. Neural Netw.
(2001) - et al.
Higher-order-statistics based radial basis function networks for signal enhancement
IEEE Trans. Neural Netw.
(2007) - et al.
Radial basis function neural networks classification for the recognition of idiopathic pulmonary fibrosis in microscopic images
IEEE Trans. Inf. Technol. Biomed.
(2008) - et al.
Short-term prediction of wind farm power: a data mining approach
IEEE Trans. Energy Convers.
(2009) TAIEX forecasting based on fuzzy time series and fuzzy variation groups
IEEE Trans. Fuzzy Syst.
(2011)
TAIEX forecasting using fuzzy time series and automatically generated weighted of multiple factors
IEEE Trans. Syst. Man Cybern. Part A
Two machine learning approaches for short-term wind speed time-series prediction
IEEE Trans. Neural Netw. Learn. Syst.
Machine learning methods for attack detection in the smart grid
IEEE Trans. Neural Netw. Learn. Syst.
Lazy learning: special issue editorial
Artif. Intell. Rev.
Neural Networks – A Comprehensive Foundation
Introduction to Radial Basis Function Networks
Shared kernel models for class conditional density estimation
IEEE Trans. Neural Netw.
Maximum likelihood estimation from incomplete data via the EM algorithm
J. R. Stat. Soc. B
Hybrid generative/discriminative training of radial basis function networks
Estimation of elliptical basis function parameters by the EM algorithm with application to speaker verification
IEEE Trans. Neural Netw.
An elliptical basis function network for classification of remote-sensing images
J. Geogr. Syst.
A very fast neural learning for classification using only new incoming datum
IEEE Trans. Neural Netw.
Multi-label classification: an overview
Int. J. Data Wareh. Min.
Classifier chains for multi-label classification
Mach. Learn. J.
Cited by (7)
Organic solar cells defects detection by means of an elliptical basis neural network and a new feature extraction technique
2019, OptikCitation Excerpt :The same topology is adopted for EBFNN [28], such a topology is composed by the following four layers: It follow that, while the input neurons are fed with the features, extracted by GLCM and reduced with SVD, representing the input image (Fig. 4), the pattern units applies the EBF kernel passing their output to the summation units [29]. This latter computes a weighted average on the received data.
Probabilistic neural network-based classifier of ToF-SIMS single-pixel spectra
2019, Chemometrics and Intelligent Laboratory SystemsAn Efficient Iterative Approach to Explainable Feature Learning
2023, IEEE Transactions on Neural Networks and Learning SystemsStructure bionic design method oriented to integration of biological advantages
2021, Structural and Multidisciplinary OptimizationOrganic solar cells defects classification by using a new feature extraction algorithm and an EBNN with an innovative pruning algorithm
2021, International Journal of Intelligent SystemsPattern classification based on RBF networks with self-constructing clustering and hybrid learning
2020, Applied Sciences (Switzerland)