Shell fitting space for classification

https://doi.org/10.1016/j.eswa.2010.09.127Get rights and content

Abstract

In this paper, a shell fitting space (SFS) is presented to map non-linearly separable data to linearly separable ones. A linear or quadratic transformation maps data into a new space for better classification, if the transformation method is properly guessed. This new SFS space can be of high or low dimensionality, and the number of dimensions is generally low and it is equal to the number of classes. The SFS method is based on fitting a hyper-plane or shell to the learning data or enclosing them into a hyper-surface. In the proposed method, the hyper-planes, curves, or cortex become the axis of the new space. In the new space a linear support vector machine (SVM) multi-class classifier is applied to classify the learn data.

Research highlights

► Classification is a very important field of pattern mining. ► Distance based classification is an easy method to classify objects. ► Classification using “shell” fitting space, promotes the classification results.

Introduction

Classification is an important research area with a wide range of applications. Nonlinear discriminant functions (NDF) are useful in training a system to recognize specific patterns and now many applications are based on this method. Neural network and support vector machine are preeminent mathematical tools of NDF. Support vector machines (Vapnik, 1995) are very popular and powerful in learning systems because of the utilization of kernel machine in linearization, providing good generalization properties, their ability to classify input patterns with minimized structural misclassification risk and finding acceptable separating hyper-plane between two classes in the feature space.

The result of applying kernels allows the algorithm to fit the maximum-margin hyper-plane in the transformed feature space. The transformation may be non-linear and the transformed space may be high dimensional; thus though the classifier is a hyper-plane in the high-dimensional feature space it may be non-linear in the original input space. If the used kernel is a Gaussian radial basis function, the corresponding feature space is a Hilbert space of infinite dimension. Maximum margin classifiers are well regularized, so the infinite dimension does not spoil the results.

Of course kernel methods (KMs) (Abe, 2005, Huang et al., 2006, Scholkopf and Smola, 2002, Shawe-Taylor and Cristianini, 2004) map input space into a high dimensional feature (HDF) space that may be helpful in linearization. As we know some kernels were proposed for this purpose, namely polynomials, Gaussians, and splines (Friedman, 1991), but these kernels do not guaranty linearization in HDF. This problem motivates us for presentation of new space in which patterns can be classified by linear classifier, we name it shell fitting space (SFS) because we use the concept of shell fitting for the creation of the new space.

Support vector machines (SVM) and its variants (Abe, 2005, Lin and Wang, 2002, Sadoghi Yazdi et al., 2007, Wang, 2005, Wu et al., 2007) is a particular instance of KMs. But it has some weaknesses as follows.

Slow training (compared to neural network) due to computationally intensive solution to QP problem especially for large amounts of training data  needs special algorithms.

  • The kernel to be used is not deterministic and it changes for each data set and finally a large feature space is produced (with many dimensions).

  • Slow classification for the trained SVM.

  • Generates complex solutions (normally > 60% of training points are used as support vectors), especially for large amounts of training data.

  • Difficult to incorporate prior knowledge.

Our proposed approach is not to expand the original space into a new space with many dimensions in comparison with kernel methods. In SFS the mapping is done to an m-dimensional space; where m is the number of classes.

The rest of this paper is as follows. Section 2 is devoted to the development of our method, Section 3 explains how the new method works on example datasets, Section 4 is devoted to the application of the method on real datasets. In Section 5 we test our method with a simple linear classifier on SFS data and compare its accuracy results with multi-class SVM results on input space data. Conclusions are made in Section 6.

Section snippets

Shell Fitting Space formulation

Definitions: {xij[x11,x21,,xK11,x12,,xK22,,x1j,x2j,,xkjj,],j=1,,m} is the ith sample with n dimensions of class j.

Cj is the fitted curve, hyperplane, or surrounded cortex or (shell) to the set {(Xij, yj), i = 1,  , kj} of data, where yj is the jth label of the training data and kj shows the number of samples with label j. In general our mapping is done from a space with m patterns (classes) of data in n-dimensional space to m-dimensional space by the following notation:φ:X={x1,x2,,xn}Rnφ(X)

Experimental results

First the proposed circle fitting space transformation method is now demonstrated using simple data as an illustrating example, and at the end, our method will be tested using some well known datasets. In the following discussion, we assume that the label for each train data is known, in advance. Here our method has been implemented and tested on MATLAB (MathWork Inc.).

Experiments on Datasets

In this section our proposed method is used to classify some datasets. These datasets are from UCI Machine Learning Repository. Datasets used here are breast-cancer-wisconsin dataset, Iris dataset, Ionosphere dataset, Transfusion dataset and heart dataset with 11, 5, 35, 15, 4, and 14 attributes with 2, 3, 2, 3, 2, and 2 classes of data in sequence.

Accuracy of our method in comparison with multi-class SVM

To show the performance of our method, we first transformed the data of each dataset to the proposed shell fitting space and we then used a simple linear classification (Adaline) to classify the transformed data. In this stage, 50% of the data in each dataset was used for the training purpose. Kernel-based SVM classifier was used on the original data to compare the accuracy of our classification results. Amongst the available kernels the one with the best performance for each dataset was used

Conclusions

In this paper we introduced a new transformation space which is easier than the other space transformations to classify input data. The main idea was to use distance of data to a line, curve, or hyperplane fitted to each class’ data or shells enclosing the classes. The results show that this space transformation works well for collections of data.

References (19)

  • S. Abe

    Support vector machines for pattern classification

    (2005)
  • S. Abe

    Advances in pattern recognition

    (2005)
  • C. Bishop

    Neural networks for pattern recognition

    (1995)
  • J.H. Friedman

    Multivariate adaptive regression splines

    Annals of Statistics

    (1991)
  • T. Huang et al.

    Kernel based algorithms for mining huge data sets

    (2006)
  • UC Irvine Machine Learning Repository....
  • J.-S.R. Jang

    ANFIS: Adaptive-network-based fuzzy inference system

    IEEE Transactions on Systems, Man, and Cybernetics

    (1993)
  • J.-S.R. Jang et al.

    Neuro–Fuzzy and Soft Computing: A computational approach to learning and machine intelligence

    (1997)
  • C.-f. Lin et al.

    Fuzzy support vector machine

    IEEE Transactions on Neural Networks

    (2002)
There are more references available in the full text version of this article.

Cited by (0)

View full text