An oblique elliptical basis function network approach for supervised learning applications

doi:10.1016/j.asoc.2017.07.019

Applied Soft Computing

Volume 60, November 2017, Pages 552-563

https://doi.org/10.1016/j.asoc.2017.07.019 Get rights and content

Highlights

•
Proposing a network model with the oblique elliptical basis function.
•
Developing procedures for deriving networks for supervised learning applications.
•
Data in oblique elliptical distributions can be represented in a more natural way.
•
Applicable to single-label classification, multi-label classification, as well as regression problems.
•
Demonstrating by experiments on real-world data sets the effectiveness of the proposed approach.

Abstract

We propose a neural network architecture based on the oblique elliptical basis function for supervised learning problems. In classification, a category can be a disconnected or non-convex region involving several overlapping or disjoint sub-regions of the feature space. Other existing supervised learning methods may have the restriction that only allows decision regions to be convex. Our proposed method overcomes this restriction by employing a rotational self-constructing clustering algorithm to decompose the feature space into a collection of sub-regions which can then be combined to make up individual categories. An unseen instance is classified to a certain category if its similarity to the category exceeds a threshold. The whole framework fits in a five-layer network consisting of input, component-similarity, cluster-similarity, aggregation, and output layers. A similar idea also applies to solving regression problems. A parameter learning algorithm based on least squares estimation is used to derive the weights of the underlying network. Our approach can offer some advantages in practicality. Through the incorporation of rotation, data can be clustered more appropriately than by standard elliptical basis functions. Also, our approach is applicable to single-label classification, multi-label classification, as well as regression problems. A number of experiments are conducted to show the effectiveness of the proposed approach.

Graphical abstract

Introduction

Supervised learning, including classification and regression, plays an important role in many fields [1], [2], [3], [4], [5], [6], [7], [8], [9]. Two kinds of classification tasks are encountered. One is single-label classification in which each instance belongs to only one category. The other one is multi-label classification in which an instance is allowed to belong to more than one category. For regression, one is usually interested in estimating the numerical relationship between a dependent variable and one or more independent variables.

Many methods, based on machine learning techniques such as k-nearest neighbors (KNN) [10], multilayered perceptrons (MLP) [11], and radial basis function (RBF) neural networks [12], have been proposed for single-label classification. KNN is a type of lazy learning. For an unseen pattern, its k nearest neighbors in the feature space are found. The unseen pattern is classified by being assigned to the most common class among these k nearest neighbors. KNN is simple, but no training is done and all computation is deferred until classification. A MLP is a feedforward artificial neural network model consisting of multiple layers of nodes, with each layer fully connected to the next one. Each node in the hidden layers is a neuron with a nonlinear activation function. MLP usually utilizes a supervised learning technique, called backpropagation, for training the network. A RBF network uses radial basis functions as activation functions. The output of the network is a linear combination of the outputs from radial basis functions. In [13], basis functions are interpreted as probability density functions. The weights are seen as prior probabilities. Models that output class conditional densities or mixture densities are proposed. A training algorithm based on the expectation-maximization (EM) algorithm [14], is developed. A hybrid training of this type of RBF networks is addressed in [15]. Some extensions of RBF neural networks, e.g., elliptical basis function (EBF) [16], [17] and versatile EBF (VEBF) [18] neural networks, have also been proposed. Fast training algorithms are used to learn a data set in only one pass. The network structure is flexible and can be adjusted during the training process. However, the Euclidean or Mahalanobis distance is used, and the learning of output weights is not taken into consideration.

To deal with multi-label text classification, two approaches are mainly adopted [19], [20]. One approach transforms a multi-label classification task into several single-label classification tasks to which single-label classification methods can be applied [21], [22]. The other extends the capability of a specific single-label classification algorithm to handle multi-label data directly. One popular problem transformation method, called binary relevance, was proposed in [21]. The method transforms the original dataset into p datasets, where p is the number of categories associated with the original dataset. Each resulting dataset contains all instances of the original dataset with only two labels, “belonging to” or “not belonging to” a particular category. Since the resulting datasets are single-labeled, all single-label classification techniques are applicable to them. Several neural-network approaches adaptively designed for multi-label classification have been proposed [27], [19], [24], [25], [26], [23]. A kernel method proposed for multi-label classification is proposed in [21]. The back-propagation multi-label learning (BP-MLL) [28] algorithm is a multi-label version of the back-propagation neural network. To take care of multi-labels, label co-occurrence is incorporated into the pairwise ranking loss function. However, it has a complex global error function to be minimized. The multi-label k-nearest neighbors (ML-KNN) [29] is a lazy learning algorithm which requires a big run-time search. ML-RBF [30] is a multi-label RBF neural network which is an extension of the traditional RBF learning algorithm. Multi-label with fuzzy relevance clustering (ML-FRC) [31] is a fuzzy relevance clustering based method for multi-label text categorization. Nam et al. [32] report that binary cross entropy can outperform the pairwise ranking loss by leveraging rectified linear units for nonlinearity. Kurata et al. [33] propose a neural network initialization method to treat some of the neurons in the final hidden layer as dedicated neurons for each pattern of label co-occurrence. These dedicated neurons are initialized to connect to the corresponding co-occurring labels with stronger weights than to others.

Various approaches based on KNN, MLP, and RBF neural networks have been proposed for regression problems [10], [11], [12]. The KNN regression method is a nonparametric method which bases its prediction on the k nearest neighbors of the target to be forecasted. However, it lacks the capability of adaptation and the testing time is long. For the MLP and RBF networks used in regression, the outputs are numerical values instead of categories. Network parameters are learned from the training data. However, determining the number of hidden nodes is a challenging issue. Fuzzy theory has been incorporated for prediction [34]. However, membership functions need to be determined which is often a challenging task. Also, no learning is offered by fuzzy theory. A neuro-fuzzy scheme which combines the fuzzy theory and a hybrid learning method is proposed for regression problems [35]. A set of fuzzy IF-THEN rules is extracted from the training examples. Then a fuzzy neural network is constructed accordingly and parameters are refined to increase the precision of the fuzzy rule-base. Extreme learning machines (ELMs) [36], [37] have also been proposed to deal with regression problems. ELM is a single-hidden layer feedforward network. The input weights connecting the input layer to the hidden layer, as well as the biases of the hidden neurons, are assigned to random values and no training is done for them. The output weights, connecting the hidden layer and the output layer, are determined by learning from training patterns.

In this paper, we present a neural network architecture based on the oblique elliptical basis function for classification and regression problems. In classification, a category can involve several overlapping or disjoint sub-regions. We decompose the feature space into a collection of sub-regions which can then be combined to make up individual categories. An unseen instance is classified to a certain category if its similarity to the category exceeds a threshold. The whole framework fits in a five-layer network consisting of input, component-similarity, cluster-similarity, aggregation, and output layers. A similar idea also applies to solving regression problems. Three steps are proposed for training such networks in the training phase: network initialization, parameter learning, and output activation. Firstly, a group of clusters is located for the given training data by a rotational self-constructing clustering algorithm. Secondly, a parameter learning algorithm based on least squares estimation is used to set the weights of the network. Finally, transfer functions are applied to derive the output response of the network. Other existing supervised learning methods may have the restriction that only allows decision regions to be convex. For instance, the decision regions KNN [10] provides are convex. For a given input, the decision region can be regarded as the surrounding convex space containing its k nearest neighbors. The perceptron algorithm [38], [39] is another example where the decision boundary separates the decision regions which are convex. Support vector machines (SVM) [40], [41] are yet another example. In a SVM model, a hyperplane is derived to divide the space into two convex decision regions. We have added these comments in this revision. However, our proposed method allows the decision region associated with an output node to be disconnected or non-convex.

This work differs from [35], [31] in that the clustering algorithm allows the given data to be grouped into clusters expressed in the form of oblique elliptical basis functions. In [35], [31], standard elliptical basis functions are used. No rotations are applied to the basis function and its axes are parallel to the coordinates. In this work, rotations are allowed and the axes may not be parallel to the coordinates. As a result, data can be clustered more appropriately by oblique elliptical basis functions than by standard elliptical basis functions. Consider a simple example in Fig. 1(a) containing 9 two-dimensional data points. A cluster with standard elliptical basis function embracing these points is shown by the dotted curve in Fig. 1(b), while a cluster with oblique elliptical basis function for the points is shown in Fig. 1(c).

The cluster of Fig. 1(c) is more fit to the 9 data points than the cluster of Fig. 1(b), covering less extraneous space. Moreover, the work in [35], [31] can only apply to either regression or classification problems, while the approach proposed in this paper is applicable to single-label classification, multi-label classification, as well as regression problems.

The rest of this paper is organized as follows. The use of neural networks for supervised learning is addressed in Section 2. Our proposed neural network approach is described in Section 3. A simple example for illustration is given in Section 4. Experimental results are presented in Section 5. Finally, concluding remarks are given in Section 6.

Section snippets

Supervised learning with neural networks

Let X = {(x⁽ⁱ⁾, y⁽ⁱ⁾)|1 ≤ i ≤ N} be a finite set of N training instances, where $x^{(i)} \in ℝ^{s}$ is an input vector with s features, i.e., $x^{(i)} = 〈 x_{1}^{(i)}, x_{2}^{(i)}, \dots, x_{s}^{(i)} 〉^{T}$ , and y⁽ⁱ⁾ is the corresponding target vector with p components, i.e., $y^{(i)} = 〈 y_{1}^{(i)}, y_{2}^{(i)}, \dots, y_{p}^{(i)} 〉^{T}$ , of the ith instance. Two cases are concerned:

•
For classification problems, y⁽ⁱ⁾ is a category vector defined as follows:

y_{d}^{(i)} = \{\begin{matrix} 1, & if x^{(i)} belongs to category d; \\ 0, & if x^{(i)} does not belong to category d, \end{matrix}

for 1 ≤ d ≤ p, where p is the number of categories associated

Proposed approach

In this section, we firstly present the neural network architecture for supervised learning. The function of each layer is described. Then the learning for the network parameters is developed. These parameters include the number of hidden nodes, the associated means and deviations, the output weights, as well as the output thresholds.

Our proposed network architecture is shown in Fig. 2 which consists of five layers: input layer (Layer 1), component-similarity layer (Layer 2), cluster-similarity

Example of a network training

We give an example of single-label classification to illustrate how the proposed idea works. Suppose we have the following training instances:

(x⁽¹⁾ = 〈0.55, 0.72〉^T, y⁽¹⁾ = 〈1, 0〉^T),

(x⁽²⁾ = 〈0.30, 0.60〉^T, y⁽²⁾ = 〈1, 0〉^T),

(x⁽³⁾ = 〈0.70, 0.35〉^T, y⁽³⁾ = 〈0, 1〉^T),

(x⁽⁴⁾ = 〈0.50, 0.52〉^T, y⁽⁴⁾ = 〈1, 0〉^T),

(x⁽⁵⁾ = 〈0.58, 0.38〉^T, y⁽⁵⁾ = 〈0, 1〉^T),

(x⁽⁶⁾ = 〈0.49, 0.40〉^T, y⁽⁶⁾ = 〈0, 1〉^T),

(x⁽⁷⁾ = 〈0.78, 0.20〉^T, y⁽⁷⁾ = 〈0, 1〉^T),

(x⁽⁸⁾ = 〈0.62, 0.25〉^T, y⁽⁸⁾ = 〈0, 1〉^T),

(x⁽⁹⁾ = 〈0.40, 0.65〉^T, y⁽⁹⁾ = 〈1, 0〉^T),

(x⁽¹⁰⁾ = 〈0.35, 0.38〉^T, y⁽¹⁰⁾ = 〈1, 0〉^T),

(x⁽¹¹⁾

Experimental results

In this section, we present experimental results to show the effectiveness of our proposed approach. Comparisons on testing accuracy and training time with other methods are also presented. The results for single-label classification are shown, followed by those for multi-label classification and regression, respectively.

For convenience, we name our approach oblique elliptic basis function network, abbreviated as OEBFN. We use a computer with Intel(R) Core(TM) i7 [email protected] GHz and 16 GB of RAM to

Concluding remarks

We have presented a five-layer neural network architecture based on the oblique elliptical basis function and an approach for setting up such networks for supervised learning applications. From a given set of training data, a group of clusters is located by a rotational self-constructing clustering algorithm. Through the incorporation of rotation, data in oblique elliptical distributions can be represented more appropriately than by standard elliptical basis functions. A parameter learning

Acknowledgments

This work was supported by the Ministry of Science and Technology under the grant MOST-103-2221-E-110-047-MY2, and by “Aim for the Top University Plan” of the National Sun Yat-Sen University and Ministry of Education. The authors are grateful to the anonymous reviewers for their constructive comments which greatly helped improving the quality of this paper.

References (49)

D. Zissis et al.
A cloud based architecture capable of perceiving and predicting multiple vessel behaviour
Appl. Soft Comput.
(2015)
G. Madjarov et al.
An extensive experimental comparison of methods for multi-label learning
Pattern Recognit.
(2012)
M. Boutell et al.
Learning multi-label scene classification
Pattern Recognit.
(2004)
M.L. Zhang et al.
A lazy learning approach to multi-label learning
Pattern Recognit.
(2007)
M.Y. Chen et al.
Online fuzzy time series analysis based on entropy discretization and a fast Fourier transform
Appl. Soft Comput.
(2014)
H. Leung et al.
Prediction of noisy chaotic time series using an optimal radial basis function neural network
IEEE Trans. Neural Netw.
(2001)
B.S. Lin et al.
Higher-order-statistics based radial basis function networks for signal enhancement
IEEE Trans. Neural Netw.
(2007)
I. Maglogiannis et al.
Radial basis function neural networks classification for the recognition of idiopathic pulmonary fibrosis in microscopic images
IEEE Trans. Inf. Technol. Biomed.
(2008)
A. Kusiak et al.
Short-term prediction of wind farm power: a data mining approach
IEEE Trans. Energy Convers.
(2009)
S.M. Chen
TAIEX forecasting based on fuzzy time series and fuzzy variation groups
IEEE Trans. Fuzzy Syst.
(2011)

S.M. Chen et al.

TAIEX forecasting using fuzzy time series and automatically generated weighted of multiple factors

IEEE Trans. Syst. Man Cybern. Part A

(2012)

R. Ak et al.

Two machine learning approaches for short-term wind speed time-series prediction

IEEE Trans. Neural Netw. Learn. Syst.

(2016)

M. Ozay et al.

Machine learning methods for attack detection in the smart grid

IEEE Trans. Neural Netw. Learn. Syst.

(2016)

D.W. Aha

Lazy learning: special issue editorial

Artif. Intell. Rev.

(1997)

S. Haykin

Neural Networks – A Comprehensive Foundation

(1999)

M.J.L. Orr

Introduction to Radial Basis Function Networks

(1996)

M. Titsias et al.

Shared kernel models for class conditional density estimation

IEEE Trans. Neural Netw.

(2001)

A.P. Dempster et al.

Maximum likelihood estimation from incomplete data via the EM algorithm

J. R. Stat. Soc. B

(1977)

A. Ferreira et al.

Hybrid generative/discriminative training of radial basis function networks

M.W. Mak et al.

Estimation of elliptical basis function parameters by the EM algorithm with application to speaker verification

IEEE Trans. Neural Netw.

(2000)

J.C. Luo et al.

An elliptical basis function network for classification of remote-sensing images

J. Geogr. Syst.

(2004)

S. Jaiyen et al.

A very fast neural learning for classification using only new incoming datum

IEEE Trans. Neural Netw.

(2010)

G. Tsoumakas et al.

Multi-label classification: an overview

Int. J. Data Wareh. Min.

(2007)

J. Read et al.

Classifier chains for multi-label classification

Mach. Learn. J.

(2011)

Cited by (7)

Organic solar cells defects detection by means of an elliptical basis neural network and a new feature extraction technique
2019, Optik
Citation Excerpt :
The same topology is adopted for EBFNN [28], such a topology is composed by the following four layers: It follow that, while the input neurons are fed with the features, extracted by GLCM and reduced with SVD, representing the input image (Fig. 4), the pattern units applies the EBF kernel passing their output to the summation units [29]. This latter computes a weighted average on the received data.
The study proposed in this paper devises to develop a new methodology based on elliptical basis neural network (EBNN) and on a new feature extraction technique in order to recognize the organic solar cells (OSCs) defects. The feature extraction procedure has been obtained by using the co-occurrence matrices and the SVD decomposition applied to atomic microscope force imagery. The polymer-based OSCs used for this work have been produced at the optoelectronic organic semiconductor devices laboratory at Ben Gurion University of the Negev. The tests performed show that with our approach it is possible to obtain a correct classification percentage of 95.4% proving that the proposed feature extraction technique based on the co-occurrence Matrix and the SVD decomposition is very effective in the detection of different types of OSC surface defects.
Probabilistic neural network-based classifier of ToF-SIMS single-pixel spectra
2019, Chemometrics and Intelligent Laboratory Systems
A new automated procedure able to extract latent chemical information from ToF-SIMS big data sets is presented. A classifier based on a probabilistic neural network able to provide a correct classification of spectra acquired from four different polymers is designed and trained. The procedure is fast and low-demanding in terms of CPU performances, and it is also able to evaluate the similarity/dissimilarity in the Fourier transform domain of very low intensity single pixel spectra without any effort of the analyst.
An Efficient Iterative Approach to Explainable Feature Learning
2023, IEEE Transactions on Neural Networks and Learning Systems
Structure bionic design method oriented to integration of biological advantages
2021, Structural and Multidisciplinary Optimization
Organic solar cells defects classification by using a new feature extraction algorithm and an EBNN with an innovative pruning algorithm
2021, International Journal of Intelligent Systems
Pattern classification based on RBF networks with self-constructing clustering and hybrid learning
2020, Applied Sciences (Switzerland)

View all citing articles on Scopus

View full text

An oblique elliptical basis function network approach for supervised learning applications

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Supervised learning with neural networks

Proposed approach

Example of a network training

Experimental results

Concluding remarks

Acknowledgments

Appl. Soft Comput.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Appl. Soft Comput.

Prediction of noisy chaotic time series using an optimal radial basis function neural network

IEEE Trans. Neural Netw.

Higher-order-statistics based radial basis function networks for signal enhancement

IEEE Trans. Neural Netw.

Radial basis function neural networks classification for the recognition of idiopathic pulmonary fibrosis in microscopic images

IEEE Trans. Inf. Technol. Biomed.

Short-term prediction of wind farm power: a data mining approach

IEEE Trans. Energy Convers.

TAIEX forecasting based on fuzzy time series and fuzzy variation groups

IEEE Trans. Fuzzy Syst.

TAIEX forecasting using fuzzy time series and automatically generated weighted of multiple factors

IEEE Trans. Syst. Man Cybern. Part A

Two machine learning approaches for short-term wind speed time-series prediction

IEEE Trans. Neural Netw. Learn. Syst.

Machine learning methods for attack detection in the smart grid

IEEE Trans. Neural Netw. Learn. Syst.

Lazy learning: special issue editorial

Artif. Intell. Rev.

Neural Networks – A Comprehensive Foundation

Introduction to Radial Basis Function Networks

Shared kernel models for class conditional density estimation

IEEE Trans. Neural Netw.

Maximum likelihood estimation from incomplete data via the EM algorithm

J. R. Stat. Soc. B

Hybrid generative/discriminative training of radial basis function networks

Estimation of elliptical basis function parameters by the EM algorithm with application to speaker verification

IEEE Trans. Neural Netw.

An elliptical basis function network for classification of remote-sensing images

J. Geogr. Syst.

A very fast neural learning for classification using only new incoming datum

IEEE Trans. Neural Netw.

Multi-label classification: an overview

Int. J. Data Wareh. Min.

Classifier chains for multi-label classification

Mach. Learn. J.