Vicinal support vector classifier using supervised kernel-based clustering
Introduction
Support vector machines (SVMs), first introduced for pattern classification and regression problems by Vapnik and colleagues [1], [2], can be seen as a new training technique for traditional polynomial, radial basis function (RBF) or multi-layer perceptron classifiers by defining relevant kernel functions [3]. SVMs have drawn considerable attention due to their high generalisation ability for a wide range of applications and typically better performance compared to other traditional leaning machines [4], [5], [6].
Rooted in statistical learning theory [7], [8], SVMs are based on a sound theoretical justification in terms of generalisation, convergence, approximation, etc., yet require the assumption that all data points in the training set are independent and identically distributed (i.i.d) according to some probability distribution(s). However, in many practical applications, the obtained training data is subject to different probability distributions with respect to different vicinities/clusters. This limits the application of the standard SVM approach for real-world problems.
The assumption that the training data are i.i.d can be relaxed, and some research explores more general conditions under which successful learning can take place [9]. In particular, relaxations of the independence assumption have been considered in both the machine learning and the statistical learning literature, e.g. in [10], [11], [12] where a weaker notion of mixing to replace the notion of independence is used and it is shown that most of the main results of statistical learning theory continue to hold under this weaker hypothesis.
In contrast, relaxations of the identically distributed assumption are less common, and few attempts have been proposed to address this limitation with the exception of the so-called vicinal SVM [8], [13]. The vicinal SVM was originally proposed according to the vicinal risk minimisation (VRM) principle, which introduces a new class of support vector machines by defining appropriate vicinal kernel functions. Its main motivation is to address different probability density functions with respect to different vicinities of the training dataset. A hyperplane is derived by maximising the margin between two classes under the vicinal risk function. The key issue in implementing the vicinal SVM is the construction of vicinal kernel functions for the given training data. Vapnik suggested an input space partitioning scheme [8] in order to define the vicinity for each training point by using Laplacian-type and Gaussian-type kernels. However, for the general non-linear case like kernel-based SVMs, applying the VRM principle with input space partitioning is not straightforward due to the non-linear mapping between input data and feature space.
In this paper, we extend the work of [14] and present an effective method to construct new vicinal kernel functions for SVM learning. These are derived based on supervised clustering in the kernel-induced feature space. Our proposed vicinal support vector classifier (VSVC) is suitable for practical applications where the learning data may come from different probability distributions. VSVC proceeds in two phases. In the clustering phase, a supervised kernel-based deterministic annealing (SKDA) clustering algorithm is used to partition the training data into different soft vicinal areas of the feature space in order to construct the vicinal kernel functions. In the training phase, an SVM is constructed so as to minimise the vicinal risk function under the constraints of the respective vicinal areas defined in the clustering phase. Incorporating the supervised clustering technique into SVM learning leads to a sparse solution, while making the proposed VSVC adaptive to different probability distributions of the training data. Experimental results on both artificial and real datasets confirm that our proposed method yields higher classification accuracy and faster training compared to the standard SVM approach.
The remainder of the paper is organised as follows. A brief review of the VRM principle and the vicinal SVM with input space partition is given in Section 2. Section 3 then presents our proposed VSVC based on supervised feature space partitioning. Experimental results are reported in Section 4, while Section 5 concludes the paper.
Section snippets
Vicinal SVM with input space partitioning
Let us consider the input–output training data pairs
where l is the number of input data points, and n is the dimension of the input space. In statistical learning theory [8], these training data pairs are normally assumed to be i.i.d (independent and identically distributed) according to an unknown probability distribution p(x, y). However, in practical applications it is possible that the probability distribution is multi-modal with different vicinities of the
The proposed approach
The basic idea of our proposed VSVC is to construct new vicinal kernel functions, derived through supervised clustering in the feature space. These vicinal kernel functions are then used for SVM learning.
Experimental results
We evaluate the effectiveness of our proposed VSVC algorithm on both artificial and real medical classification problems. The VSVC implementation3 is developed based on the standard SVM MATLAB program [25].
Conclusions
In this paper, we have proposed a vicinal support vector classifier which is based on the vicinal risk minimisation principle for data classification. Our approach constructs new vicinal kernel functions by employing a supervised clustering algorithm, supervised kernel-based deterministic annealing, for training a support vector machine. The proposed approach proceeds in two phases: SKDA clustering and SVM learning. The aim of VSVC is to minimise the vicinal risk function under the constraints
Acknowledgment
The authors thank the anonymous reviewers for their insightful comments and valuable suggestions on earlier versions of the paper.
References (31)
- et al.
Learning from dependent observations
Journal of Multivariate Analysis
(2009) Fuzzy c-varieties/elliptotypes clustering in reproducing kernel Hilbert space
Fuzzy Sets and Systems
(2004)- et al.
A training algorithm for optimal margin classifiers
- et al.
Support vector networks
Machine Learning
(1995) - et al.
Support vector machines: training and applications, Tech. rep.
(1997) - et al.
An introduction to support vector machines and other kernel-based learning methods
(2000) - et al.
Learning with kernels
(2002) - et al.
An introduction to kernel based learning algorithms
IEEE Transactions on Neural Networks
(2001) An overview of statistical learning theory
IEEE Transactions on Neural Networks
(1999)The nature of statistical learning theory
(2000)
An elementary introduction to statistical learning theory
Pattern recognition for conditionally independent data
Journal of Machine Learning Research
Learning and generalization with applications to neural networks
Vicinal risk minimization
Mammographic mass detection by vicinal support vector machine
Cited by (9)
A highly efficient system for Mammographic Image Classification Using NSVC Algorithm
2019, Procedia Computer ScienceDeep Learning Causal Attributions of Breast Cancer
2021, Lecture Notes in Networks and SystemsA Robust Method for Face Classification Based on Binary Genetic Algorithm Combined with NSVC Classifier
2021, Lecture Notes in Networks and SystemsNew proposed fusion between DCT for feature extraction and NSVC for face classification
2018, Cybernetics and Information TechnologiesVicinal support vector classifier: A novel approach for robust classification based on SKDA
2017, Pattern Recognition and Image AnalysisUniformed two local binary pattern combined with neighboring support vector classifier for classification
2017, International Journal of Artificial Intelligence