Elsevier

Applied Soft Computing

Volume 60, November 2017, Pages 552-563
Applied Soft Computing

An oblique elliptical basis function network approach for supervised learning applications

https://doi.org/10.1016/j.asoc.2017.07.019Get rights and content

Highlights

  • Proposing a network model with the oblique elliptical basis function.

  • Developing procedures for deriving networks for supervised learning applications.

  • Data in oblique elliptical distributions can be represented in a more natural way.

  • Applicable to single-label classification, multi-label classification, as well as regression problems.

  • Demonstrating by experiments on real-world data sets the effectiveness of the proposed approach.

Abstract

We propose a neural network architecture based on the oblique elliptical basis function for supervised learning problems. In classification, a category can be a disconnected or non-convex region involving several overlapping or disjoint sub-regions of the feature space. Other existing supervised learning methods may have the restriction that only allows decision regions to be convex. Our proposed method overcomes this restriction by employing a rotational self-constructing clustering algorithm to decompose the feature space into a collection of sub-regions which can then be combined to make up individual categories. An unseen instance is classified to a certain category if its similarity to the category exceeds a threshold. The whole framework fits in a five-layer network consisting of input, component-similarity, cluster-similarity, aggregation, and output layers. A similar idea also applies to solving regression problems. A parameter learning algorithm based on least squares estimation is used to derive the weights of the underlying network. Our approach can offer some advantages in practicality. Through the incorporation of rotation, data can be clustered more appropriately than by standard elliptical basis functions. Also, our approach is applicable to single-label classification, multi-label classification, as well as regression problems. A number of experiments are conducted to show the effectiveness of the proposed approach.

Introduction

Supervised learning, including classification and regression, plays an important role in many fields [1], [2], [3], [4], [5], [6], [7], [8], [9]. Two kinds of classification tasks are encountered. One is single-label classification in which each instance belongs to only one category. The other one is multi-label classification in which an instance is allowed to belong to more than one category. For regression, one is usually interested in estimating the numerical relationship between a dependent variable and one or more independent variables.

Many methods, based on machine learning techniques such as k-nearest neighbors (KNN) [10], multilayered perceptrons (MLP) [11], and radial basis function (RBF) neural networks [12], have been proposed for single-label classification. KNN is a type of lazy learning. For an unseen pattern, its k nearest neighbors in the feature space are found. The unseen pattern is classified by being assigned to the most common class among these k nearest neighbors. KNN is simple, but no training is done and all computation is deferred until classification. A MLP is a feedforward artificial neural network model consisting of multiple layers of nodes, with each layer fully connected to the next one. Each node in the hidden layers is a neuron with a nonlinear activation function. MLP usually utilizes a supervised learning technique, called backpropagation, for training the network. A RBF network uses radial basis functions as activation functions. The output of the network is a linear combination of the outputs from radial basis functions. In [13], basis functions are interpreted as probability density functions. The weights are seen as prior probabilities. Models that output class conditional densities or mixture densities are proposed. A training algorithm based on the expectation-maximization (EM) algorithm [14], is developed. A hybrid training of this type of RBF networks is addressed in [15]. Some extensions of RBF neural networks, e.g., elliptical basis function (EBF) [16], [17] and versatile EBF (VEBF) [18] neural networks, have also been proposed. Fast training algorithms are used to learn a data set in only one pass. The network structure is flexible and can be adjusted during the training process. However, the Euclidean or Mahalanobis distance is used, and the learning of output weights is not taken into consideration.

To deal with multi-label text classification, two approaches are mainly adopted [19], [20]. One approach transforms a multi-label classification task into several single-label classification tasks to which single-label classification methods can be applied [21], [22]. The other extends the capability of a specific single-label classification algorithm to handle multi-label data directly. One popular problem transformation method, called binary relevance, was proposed in [21]. The method transforms the original dataset into p datasets, where p is the number of categories associated with the original dataset. Each resulting dataset contains all instances of the original dataset with only two labels, “belonging to” or “not belonging to” a particular category. Since the resulting datasets are single-labeled, all single-label classification techniques are applicable to them. Several neural-network approaches adaptively designed for multi-label classification have been proposed [27], [19], [24], [25], [26], [23]. A kernel method proposed for multi-label classification is proposed in [21]. The back-propagation multi-label learning (BP-MLL) [28] algorithm is a multi-label version of the back-propagation neural network. To take care of multi-labels, label co-occurrence is incorporated into the pairwise ranking loss function. However, it has a complex global error function to be minimized. The multi-label k-nearest neighbors (ML-KNN) [29] is a lazy learning algorithm which requires a big run-time search. ML-RBF [30] is a multi-label RBF neural network which is an extension of the traditional RBF learning algorithm. Multi-label with fuzzy relevance clustering (ML-FRC) [31] is a fuzzy relevance clustering based method for multi-label text categorization. Nam et al. [32] report that binary cross entropy can outperform the pairwise ranking loss by leveraging rectified linear units for nonlinearity. Kurata et al. [33] propose a neural network initialization method to treat some of the neurons in the final hidden layer as dedicated neurons for each pattern of label co-occurrence. These dedicated neurons are initialized to connect to the corresponding co-occurring labels with stronger weights than to others.

Various approaches based on KNN, MLP, and RBF neural networks have been proposed for regression problems [10], [11], [12]. The KNN regression method is a nonparametric method which bases its prediction on the k nearest neighbors of the target to be forecasted. However, it lacks the capability of adaptation and the testing time is long. For the MLP and RBF networks used in regression, the outputs are numerical values instead of categories. Network parameters are learned from the training data. However, determining the number of hidden nodes is a challenging issue. Fuzzy theory has been incorporated for prediction [34]. However, membership functions need to be determined which is often a challenging task. Also, no learning is offered by fuzzy theory. A neuro-fuzzy scheme which combines the fuzzy theory and a hybrid learning method is proposed for regression problems [35]. A set of fuzzy IF-THEN rules is extracted from the training examples. Then a fuzzy neural network is constructed accordingly and parameters are refined to increase the precision of the fuzzy rule-base. Extreme learning machines (ELMs) [36], [37] have also been proposed to deal with regression problems. ELM is a single-hidden layer feedforward network. The input weights connecting the input layer to the hidden layer, as well as the biases of the hidden neurons, are assigned to random values and no training is done for them. The output weights, connecting the hidden layer and the output layer, are determined by learning from training patterns.

In this paper, we present a neural network architecture based on the oblique elliptical basis function for classification and regression problems. In classification, a category can involve several overlapping or disjoint sub-regions. We decompose the feature space into a collection of sub-regions which can then be combined to make up individual categories. An unseen instance is classified to a certain category if its similarity to the category exceeds a threshold. The whole framework fits in a five-layer network consisting of input, component-similarity, cluster-similarity, aggregation, and output layers. A similar idea also applies to solving regression problems. Three steps are proposed for training such networks in the training phase: network initialization, parameter learning, and output activation. Firstly, a group of clusters is located for the given training data by a rotational self-constructing clustering algorithm. Secondly, a parameter learning algorithm based on least squares estimation is used to set the weights of the network. Finally, transfer functions are applied to derive the output response of the network. Other existing supervised learning methods may have the restriction that only allows decision regions to be convex. For instance, the decision regions KNN [10] provides are convex. For a given input, the decision region can be regarded as the surrounding convex space containing its k nearest neighbors. The perceptron algorithm [38], [39] is another example where the decision boundary separates the decision regions which are convex. Support vector machines (SVM) [40], [41] are yet another example. In a SVM model, a hyperplane is derived to divide the space into two convex decision regions. We have added these comments in this revision. However, our proposed method allows the decision region associated with an output node to be disconnected or non-convex.

This work differs from [35], [31] in that the clustering algorithm allows the given data to be grouped into clusters expressed in the form of oblique elliptical basis functions. In [35], [31], standard elliptical basis functions are used. No rotations are applied to the basis function and its axes are parallel to the coordinates. In this work, rotations are allowed and the axes may not be parallel to the coordinates. As a result, data can be clustered more appropriately by oblique elliptical basis functions than by standard elliptical basis functions. Consider a simple example in Fig. 1(a) containing 9 two-dimensional data points. A cluster with standard elliptical basis function embracing these points is shown by the dotted curve in Fig. 1(b), while a cluster with oblique elliptical basis function for the points is shown in Fig. 1(c).

The cluster of Fig. 1(c) is more fit to the 9 data points than the cluster of Fig. 1(b), covering less extraneous space. Moreover, the work in [35], [31] can only apply to either regression or classification problems, while the approach proposed in this paper is applicable to single-label classification, multi-label classification, as well as regression problems.

The rest of this paper is organized as follows. The use of neural networks for supervised learning is addressed in Section 2. Our proposed neural network approach is described in Section 3. A simple example for illustration is given in Section 4. Experimental results are presented in Section 5. Finally, concluding remarks are given in Section 6.

Section snippets

Supervised learning with neural networks

Let X = {(x(i), y(i))|1  i  N} be a finite set of N training instances, where x(i)s is an input vector with s features, i.e., x(i)=x1(i),x2(i),,xs(i)T, and y(i) is the corresponding target vector with p components, i.e., y(i)=y1(i),y2(i),,yp(i)T, of the ith instance. Two cases are concerned:

  • For classification problems, y(i) is a category vector defined as follows:

yd(i)=1,ifx(i)belongs to categoryd;0,ifx(i)does not belong to categoryd,for 1  d  p, where p is the number of categories associated

Proposed approach

In this section, we firstly present the neural network architecture for supervised learning. The function of each layer is described. Then the learning for the network parameters is developed. These parameters include the number of hidden nodes, the associated means and deviations, the output weights, as well as the output thresholds.

Our proposed network architecture is shown in Fig. 2 which consists of five layers: input layer (Layer 1), component-similarity layer (Layer 2), cluster-similarity

Example of a network training

We give an example of single-label classification to illustrate how the proposed idea works. Suppose we have the following training instances:

(x(1) = 〈0.55, 0.72〉T, y(1) = 〈1, 0〉T),
(x(2) = 〈0.30, 0.60〉T, y(2) = 〈1, 0〉T),
(x(3) = 〈0.70, 0.35〉T, y(3) = 〈0, 1〉T),
(x(4) = 〈0.50, 0.52〉T, y(4) = 〈1, 0〉T),
(x(5) = 〈0.58, 0.38〉T, y(5) = 〈0, 1〉T),
(x(6) = 〈0.49, 0.40〉T, y(6) = 〈0, 1〉T),
(x(7) = 〈0.78, 0.20〉T, y(7) = 〈0, 1〉T),
(x(8) = 〈0.62, 0.25〉T, y(8) = 〈0, 1〉T),
(x(9) = 〈0.40, 0.65〉T, y(9) = 〈1, 0〉T),
(x(10) = 〈0.35, 0.38〉T, y(10) = 〈1, 0〉T),
(x(11)

Experimental results

In this section, we present experimental results to show the effectiveness of our proposed approach. Comparisons on testing accuracy and training time with other methods are also presented. The results for single-label classification are shown, followed by those for multi-label classification and regression, respectively.

For convenience, we name our approach oblique elliptic basis function network, abbreviated as OEBFN. We use a computer with Intel(R) Core(TM) i7 [email protected] GHz and 16 GB of RAM to

Concluding remarks

We have presented a five-layer neural network architecture based on the oblique elliptical basis function and an approach for setting up such networks for supervised learning applications. From a given set of training data, a group of clusters is located by a rotational self-constructing clustering algorithm. Through the incorporation of rotation, data in oblique elliptical distributions can be represented more appropriately than by standard elliptical basis functions. A parameter learning

Acknowledgments

This work was supported by the Ministry of Science and Technology under the grant MOST-103-2221-E-110-047-MY2, and by “Aim for the Top University Plan” of the National Sun Yat-Sen University and Ministry of Education. The authors are grateful to the anonymous reviewers for their constructive comments which greatly helped improving the quality of this paper.

References (49)

  • S.M. Chen et al.

    TAIEX forecasting using fuzzy time series and automatically generated weighted of multiple factors

    IEEE Trans. Syst. Man Cybern. Part A

    (2012)
  • R. Ak et al.

    Two machine learning approaches for short-term wind speed time-series prediction

    IEEE Trans. Neural Netw. Learn. Syst.

    (2016)
  • M. Ozay et al.

    Machine learning methods for attack detection in the smart grid

    IEEE Trans. Neural Netw. Learn. Syst.

    (2016)
  • D.W. Aha

    Lazy learning: special issue editorial

    Artif. Intell. Rev.

    (1997)
  • S. Haykin

    Neural Networks – A Comprehensive Foundation

    (1999)
  • M.J.L. Orr

    Introduction to Radial Basis Function Networks

    (1996)
  • M. Titsias et al.

    Shared kernel models for class conditional density estimation

    IEEE Trans. Neural Netw.

    (2001)
  • A.P. Dempster et al.

    Maximum likelihood estimation from incomplete data via the EM algorithm

    J. R. Stat. Soc. B

    (1977)
  • A. Ferreira et al.

    Hybrid generative/discriminative training of radial basis function networks

  • M.W. Mak et al.

    Estimation of elliptical basis function parameters by the EM algorithm with application to speaker verification

    IEEE Trans. Neural Netw.

    (2000)
  • J.C. Luo et al.

    An elliptical basis function network for classification of remote-sensing images

    J. Geogr. Syst.

    (2004)
  • S. Jaiyen et al.

    A very fast neural learning for classification using only new incoming datum

    IEEE Trans. Neural Netw.

    (2010)
  • G. Tsoumakas et al.

    Multi-label classification: an overview

    Int. J. Data Wareh. Min.

    (2007)
  • J. Read et al.

    Classifier chains for multi-label classification

    Mach. Learn. J.

    (2011)
  • Cited by (7)

    View all citing articles on Scopus
    View full text