Vicinal support vector classifier using supervised kernel-based clustering

https://doi.org/10.1016/j.artmed.2014.01.003Get rights and content

Abstract

Objective

Support vector machines (SVMs) have drawn considerable attention due to their high generalisation ability and superior classification performance compared to other pattern recognition algorithms. However, the assumption that the learning data is identically generated from unknown probability distributions may limit the application of SVMs for real problems. In this paper, we propose a vicinal support vector classifier (VSVC) which is shown to be able to effectively handle practical applications where the learning data may originate from different probability distributions.

Methods

The proposed VSVC method utilises a set of new vicinal kernel functions which are constructed based on supervised clustering in the kernel-induced feature space. Our proposed approach comprises two steps. In the clustering step, a supervised kernel-based deterministic annealing (SKDA) clustering algorithm is employed to partition the training data into different soft vicinal areas of the feature space in order to construct the vicinal kernel functions. In the training step, the SVM technique is used to minimise the vicinal risk function under the constraints of the vicinal areas defined in the SKDA clustering step.

Results

Experimental results on both artificial and real medical datasets show our proposed VSVC achieves better classification accuracy and lower computational time compared to a standard SVM. For an artificial dataset constructed from non-separated data, the classification accuracy of VSVC is between 95.5% and 96.25% (using different cluster numbers) which compares favourably to the 94.5% achieved by SVM. The VSVC training time is between 8.75 s and 17.83 s (for 2–8 clusters), considerable less than the 65.0 s required by SVM. On a real mammography dataset, the best classification accuracy of VSVC is 85.7% and thus clearly outperforms a standard SVM which obtains an accuracy of only 82.1%. A similar performance improvement is confirmed on two further real datasets, a breast cancer dataset (74.01% vs. 72.52%) and a heart dataset (84.77% vs. 83.81%), coupled with a reduction in terms of learning time (32.07 s vs. 92.08 s and 25.00 s vs. 53.31 s, respectively). Furthermore, the VSVC results in the number of support vectors being equal to the specified cluster number, and hence in a much sparser solution compared to a standard SVM.

Conclusion

Incorporating a supervised clustering algorithm into the SVM technique leads to a sparse but effective solution, while making the proposed VSVC adaptive to different probability distributions of the training data.

Introduction

Support vector machines (SVMs), first introduced for pattern classification and regression problems by Vapnik and colleagues [1], [2], can be seen as a new training technique for traditional polynomial, radial basis function (RBF) or multi-layer perceptron classifiers by defining relevant kernel functions [3]. SVMs have drawn considerable attention due to their high generalisation ability for a wide range of applications and typically better performance compared to other traditional leaning machines [4], [5], [6].

Rooted in statistical learning theory [7], [8], SVMs are based on a sound theoretical justification in terms of generalisation, convergence, approximation, etc., yet require the assumption that all data points in the training set are independent and identically distributed (i.i.d) according to some probability distribution(s). However, in many practical applications, the obtained training data is subject to different probability distributions with respect to different vicinities/clusters. This limits the application of the standard SVM approach for real-world problems.

The assumption that the training data are i.i.d can be relaxed, and some research explores more general conditions under which successful learning can take place [9]. In particular, relaxations of the independence assumption have been considered in both the machine learning and the statistical learning literature, e.g. in [10], [11], [12] where a weaker notion of mixing to replace the notion of independence is used and it is shown that most of the main results of statistical learning theory continue to hold under this weaker hypothesis.

In contrast, relaxations of the identically distributed assumption are less common, and few attempts have been proposed to address this limitation with the exception of the so-called vicinal SVM [8], [13]. The vicinal SVM was originally proposed according to the vicinal risk minimisation (VRM) principle, which introduces a new class of support vector machines by defining appropriate vicinal kernel functions. Its main motivation is to address different probability density functions with respect to different vicinities of the training dataset. A hyperplane is derived by maximising the margin between two classes under the vicinal risk function. The key issue in implementing the vicinal SVM is the construction of vicinal kernel functions for the given training data. Vapnik suggested an input space partitioning scheme [8] in order to define the vicinity for each training point by using Laplacian-type and Gaussian-type kernels. However, for the general non-linear case like kernel-based SVMs, applying the VRM principle with input space partitioning is not straightforward due to the non-linear mapping between input data and feature space.

In this paper, we extend the work of [14] and present an effective method to construct new vicinal kernel functions for SVM learning. These are derived based on supervised clustering in the kernel-induced feature space. Our proposed vicinal support vector classifier (VSVC) is suitable for practical applications where the learning data may come from different probability distributions. VSVC proceeds in two phases. In the clustering phase, a supervised kernel-based deterministic annealing (SKDA) clustering algorithm is used to partition the training data into different soft vicinal areas of the feature space in order to construct the vicinal kernel functions. In the training phase, an SVM is constructed so as to minimise the vicinal risk function under the constraints of the respective vicinal areas defined in the clustering phase. Incorporating the supervised clustering technique into SVM learning leads to a sparse solution, while making the proposed VSVC adaptive to different probability distributions of the training data. Experimental results on both artificial and real datasets confirm that our proposed method yields higher classification accuracy and faster training compared to the standard SVM approach.

The remainder of the paper is organised as follows. A brief review of the VRM principle and the vicinal SVM with input space partition is given in Section 2. Section 3 then presents our proposed VSVC based on supervised feature space partitioning. Experimental results are reported in Section 4, while Section 5 concludes the paper.

Section snippets

Vicinal SVM with input space partitioning

Let us consider the input–output training data pairs

{(xi,yi)}i=1l,xiRn,yi{1,1},where l is the number of input data points, and n is the dimension of the input space. In statistical learning theory [8], these training data pairs are normally assumed to be i.i.d (independent and identically distributed) according to an unknown probability distribution p(x, y). However, in practical applications it is possible that the probability distribution is multi-modal with different vicinities of the

The proposed approach

The basic idea of our proposed VSVC is to construct new vicinal kernel functions, derived through supervised clustering in the feature space. These vicinal kernel functions are then used for SVM learning.

Experimental results

We evaluate the effectiveness of our proposed VSVC algorithm on both artificial and real medical classification problems. The VSVC implementation3 is developed based on the standard SVM MATLAB program [25].

Conclusions

In this paper, we have proposed a vicinal support vector classifier which is based on the vicinal risk minimisation principle for data classification. Our approach constructs new vicinal kernel functions by employing a supervised clustering algorithm, supervised kernel-based deterministic annealing, for training a support vector machine. The proposed approach proceeds in two phases: SKDA clustering and SVM learning. The aim of VSVC is to minimise the vicinal risk function under the constraints

Acknowledgment

The authors thank the anonymous reviewers for their insightful comments and valuable suggestions on earlier versions of the paper.

References (31)

  • I. Steinwart et al.

    Learning from dependent observations

    Journal of Multivariate Analysis

    (2009)
  • J. Leski

    Fuzzy c-varieties/elliptotypes clustering in reproducing kernel Hilbert space

    Fuzzy Sets and Systems

    (2004)
  • B. Boser et al.

    A training algorithm for optimal margin classifiers

  • C. Cortes et al.

    Support vector networks

    Machine Learning

    (1995)
  • E. Osuna et al.

    Support vector machines: training and applications, Tech. rep.

    (1997)
  • N. Cristianin et al.

    An introduction to support vector machines and other kernel-based learning methods

    (2000)
  • B. Scholkopf et al.

    Learning with kernels

    (2002)
  • K. Muller et al.

    An introduction to kernel based learning algorithms

    IEEE Transactions on Neural Networks

    (2001)
  • V. Vapnik

    An overview of statistical learning theory

    IEEE Transactions on Neural Networks

    (1999)
  • V. Vapnik

    The nature of statistical learning theory

    (2000)
  • S. Kulkarni et al.

    An elementary introduction to statistical learning theory

    (2011)
  • D. Ryabko

    Pattern recognition for conditionally independent data

    Journal of Machine Learning Research

    (2006)
  • M. Vidyasagar

    Learning and generalization with applications to neural networks

    (2000)
  • O. Chapelle et al.

    Vicinal risk minimization

  • A. Cao et al.

    Mammographic mass detection by vicinal support vector machine

  • Cited by (9)

    View all citing articles on Scopus
    View full text