Elsevier

Signal Processing

Volume 89, Issue 4, April 2009, Pages 510-522
Signal Processing

Nonparallel plane proximal classifier

https://doi.org/10.1016/j.sigpro.2008.10.002Get rights and content

Abstract

We observed that the two costly optimization problems of twin support vector machine (TWSVM) classifier can be avoided by introducing a technique as used in proximal support vector machine (PSVM) classifier. With this modus operandi we formulate a much simpler nonparallel plane proximal classifier (NPPC) for speeding up the training of it by reducing significant computational burden over TWSVM. The formulation of NPPC for binary data classification is based on two identical mean square error (MSE) optimization problems which lead to solving two small systems of linear equations in input space. Thus it eliminates the need of any specialized software for solving the quadratic programming problems (QPPs). The formulation is also extended for nonlinear kernel classifier. Our computations show that a MATLAB implementation of NPPC can be trained with a data set of 3 million points with 10 attributes in less than 3 s. Computational results on synthetic as well as on several bench mark data sets indicate the advantages of the proposed classifier in both computational time and test accuracy. The experimental results also indicate that performances of classifiers obtained by MSE approach are sufficient in many cases than the classifiers obtained by standard SVM approach.

Introduction

Support vector machine (SVM) algorithm is an excellent tool for binary data classification [1], [2], [3], [4]. This learning strategy introduced by Vapnik and co-worker [1] is a principled and very powerful method in machine learning algorithm. Within a few years after its introduction SVM has already outperformed most other systems in a wide variety of applications. These include a wide spectrum of research areas, ranging from pattern recognition [5], text categorization [6], biomedicine [7], [8], brain–computer interface [9], [10], and financial applications [11], [12], etc.

The theory of SVM, proposed by Vapnik, is based on the idea of structural risk minimization (SRM) principle [1], [2], [3]. In its simplest form, SVM for a linearly separable two class problem finds an optimal hyper plane that maximizes the separation between the two classes. The hyper plane is obtained by solving a quadratic optimization problem. For nonlinearly separable cases the input feature vectors are first mapped into a high dimensional feature space by using a nonlinear kernel function [4], [13], [14]. A linear classifier is then implemented in that feature space to classify the data. One of the main challenges of standard SVM is that it requires large training time for huge database as it has to optimize a computationally expensive cost function. The performance of a trained SVM classifier also depends on the optimal parameter set which is usually found by cross-validation on a tuning set [15]. The large training time of SVM also prevents one to locate optimal parameter set from a very fine grid of parameters over large span. To remove this drawback, various versions of SVM have been reported by many researchers with comparable classifications ability. Introduction of proximal type of SVMs [16], [17], [18] eradicate the above shortcoming of standard SVM classifier. These classifiers avoid the costly optimization problem of SVM and as a result they are very fast. Such formulations of SVM can be interpreted as regularized least squares and considered in the much more general context of regularized networks [19], [20].

All the above classifiers discriminate a pattern by determining in which half space it lies. Mangasarian and Wild [21] first proposed a classification method by the proximity of patterns to one of the two nonparallel planes. They named it as the generalized eigenvalue proximal support vector machine (GEPSVM) classifier. Instead of finding a single hyperplane, GEPSVM finds two nonparallel hyperplanes such that each plane is clustered around one particular class data. For this GEPSVM solves two allied generalized eigenvalue problems. Although this approach is called a SVM but it is more likely to discriminate patterns by using fisher information criterion [13], [15]. Because by changing the two class margin representation by “parallel” to “nonparallel” hyperplanes it switches from a binary to potentially many class approach. The linear kernel GEPSVM is very fast as it solves two generalized eigenvalue problems of the order of input space dimension. But performance of it is only comparable with standard SVM and in many cases it gives low classification rates. Recently, Jayadeva et al. [22] proposed twin support vector machine (TWSVM) classifier. In TWSVM also two nonparallel planes are generated similar to GEPSVM but in a different technique. For this purpose it solves two smaller sized quadratic programming problems (QPPs) instead of solving large one as in the standard SVM [2], [3], [4]. Although TWSVM and GEPSVM classify data by two nonparallel planes yet the former is more likely to a typical SVM problem which does not eliminate the basic assumption of selecting a minimum number of support vectors [23]. Although TWSVM achieves good classification accuracy but it is not desirable to solve two optimization problems in many cases, predominantly for large data sets due to higher learning time. This fact motivates us to formulate the proposed classifier such that it has good classification ability as TWSVM [22] and at the same time it should computationally efficient as PSVM [18] or linear GEPSVM [21].

In this paper, we recommend binary data classifier, named as nonparallel plane proximal classifier (NPPC). NPPC also classifies binary data by the proximity of it to one of two nonparallel planes. The formulation of NPPC is totally different from that of GEPSVM [21]. But the formulations of the objective functions of NPPC are similar to that of TWSVM [22] with a different loss function and equality constraints instead of inequality constraints. We call this formulation nonparallel proximal plane classifier (NPPC) rather than a SVM classifier as there is no SRM by margin maximization between the two classes like standard SVM. Thus it can be interpreted as a classifier obtained by regularized mean square error (MSE) optimization. At last the most important fact is that the computational results on several data sets show that the performance of such classifiers obtained by MSE optimization is comparable or even better than the SVM classifiers and eliminates the need of computational costly SVM classifier in many cases.

The rest of this paper is organized as follows. A brief introduction of all the SVM classifiers is given in Section 2. In Section 3, we have formulated NPPC for linear kernel with two visual examples in two dimensions and in Section 4; we have extended the formulation for the nonlinear kernels and demonstrated its performance visually by one example. In Section 5 performances of our proposed NPPC is compared with other SVM classifiers for linear and nonlinear kernels. Finally Section 6 concludes the paper.

A concise utterance regarding the notations used in this paper [21] is as follows. All vectors are considered as column vectors if not they are transposed by using a superscript T. Inner product of two vectors x and y in n-dimensional real space n is denoted by xTy and the two-norm of x is indicated by ||x||. The vector e represents column vector of ones of proper dimension whereas I stands for identity matrix of subjective dimension. In case of a matrix, containing feature vectors, Am×n the ith row Ai is a row vector in n. Matrices A and B contain the feature vectors of classes +1 and −1, respectively. For Am1×n and Cn×m, a kernel K(A,C) maps m1×n×n×m into m1×m. Only the symmetric property of the kernel is assumed [21] without any use of Mercer's positive definiteness condition [2], [3], [4], [13], [14]. The ijth element of the assumed Gaussian kernel [2] for testing nonlinear classification is given by (K(A,C))ij=ε-μ||AiT-Cj||2, where i=1,…,m1, j=1,…,m and μ is a positive constant, ε is the base of the natural logarithm, and A and C are as described above.

Section snippets

The linear SVM

SVM is a state-of-the-art of machine learning algorithm which is based on guaranteed risk bounds of statistical learning theory [1], [2] which is known as SRM principle. Among several tutorials on SVM literature we refer to [4].

Given m training pairs (x1,y1), …, (xm, ym), where xin is an input vector labeled by yi∈{+1,−1} for i=1,…,m, the linear SVM classifier search for an optimal separating hyperplaneωTx+b=0where b is the bias term and ωn is the normal vector to the hyperplane.

The NPPC formulation

In this section, we elaborate the formulation of the classifier which we name as the nonparallel plane proximal classifier (NPPC). In the formulation of NPPC we have applied the concept of both TWSVM [22] and PSVM [18] with some modification to find two nonparallel planes. To obtain the two nonparallel planes as defined in (6), the linear NPPC (LNPPC) solves the following pair of QPPs:(LNPPC1)Min(ω1,b1,ξ2)(n+1+m2)12Aω1+e1b12+c1e2Tξ2+c22ξ2Tξ2s.t.-(Bω1+e2b1)+ξ2=e2and(LNPPC2)Min(ω2,b2,ξ1)(n+1

Nonlinear kernel nonparallel plane proximal classifier (NKNPPC) formulation

In this section we have extended our formulation to nonlinear classifiers by considering kernel generated surfaces instead of planes [18], [21], [22].

For nonlinearly separable case, the input data is first projected into a kernel generated feature space of same or higher dimension than that of the input space. To apply this transformation let K(.,.) is a nonlinear kernel function and define the augmented matrix C=[AB]m×nwhere m1+m2=m, the total patterns in the training set. We now construct

Numerical testing and comparison

To compare the performance of our NPPC we investigate results in terms of accuracy and execution time on publicly available benchmark data sets from the UCI Repository [34] which are commonly used in testing machine learning algorithms. All the classification methods are implemented in MATLAB 7 [35] on Windows XP running on a PC with system configuration Intel P4 processor (3.06 GHz) with 1 GB of RAM. We compare both linear and nonlinear kernel classifiers using NPPC, TWSVM [22], GEPSVM [21],

Conclusion

In this paper we essentially put side by side the performance of classifiers obtained by margin maximization concept in MSE framework versus classifiers obtained by standard SVM approach. Based on the experimental results on benchmark data sets it conveys that SVM may be replaced by simpler optimization problems in several cases, in which we do not have to consider support vectors and inequality constraints. The computational results given in Table 1, Table 2 indicate that MSE optimization is

Acknowledgements

Authors would like to thank the referees for very useful comments and suggestions which greatly improved our representation. Authors are also grateful to Professor P. Mitra and Professor A. Routray of IIT Kharagpur for their help in presentation of the paper. Santanu Ghorai acknowledges the financial support of the authority of MCKV Institute of Engineering, Liluah, Howrah 711204, W.B., India, and All India Council of Technical Education (AICTE, India) in the form of salary and scholarship,

References (39)

  • K.S. Chua

    Efficient computations for large least square support vector machine classifiers

    Pattern Recognition Lett.

    (2003)
  • C. Cortes et al.

    Support vector networks

    Machine Learning

    (1995)
  • V. Vapnik

    The Nature of Statistical Learning Theory

    (1995)
  • N. Cristianini et al.
    (2000)
  • C.J.C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Mining Knowledge Discovery

    (1998)
  • S. Lee, A. Verri, Pattern recognition with support vector machines, in: First International Workshop, SVM 2002,...
  • T. Joachims, C. Ndellec, C. Rouveriol, Text categorization with support vector machines: learning with many relevant...
  • D. Lin et al.
    (2000)
  • W.S. Noble

    Kernel Methods in Computational Biology, Support Vector Machine Applications in Computational Biology

    (2004)
  • T. Ebrahimi et al.

    Joint time-frequency-space classification of EEG in a brain–computer interface application

    J. Appl. Signal Process.

    (2003)
  • T.N. Lal et al.

    Support vector channel selection in BCI

    IEEE Trans. Biomed. Eng.

    (2004)
  • H. Ince, T.B. Trafalis, Support vector machine for regression and applications to financial forecasting, in:...
  • C.J. Hsu et al.

    Credit rating analysis with support vector machines and neural networks: a market comparative study

    Decision Support Systems

    (2004)
  • N. Cristianini et al.

    Kernel Methods for Pattern Analysis

    (2004)
  • T. Joachims

    Making large-scale support vector machine learning practical

  • S. Haykin, Neural Networks—A Comprehensive Foundation, second ed., Pearson Education, 2006, Chapter 4, pp....
  • J.A.K. Suykens et al.

    Least squares support vector machine classifiers

    Neural Process. Lett.

    (1999)
  • J.A.K. Suykens et al.

    Least Squares Support Vector Machines

    (2002)
  • G. Fung, O.L. Mangasarian, Proximal support vector machine classifiers, in: 7th International Proceedings on Knowledge...
  • Cited by (114)

    • PTSVRs: Regression models via projection twin support vector machine

      2018, Information Sciences
      Citation Excerpt :

      For example, the parametric-insensitive ν-support vector regression (par-ν-SVR) [12] simultaneously finds the decision regressor and a pair of nonparallel parametric-insensitive bounds by solving a SVR-type problem. In the spirit of the twin support vector machine (TWSVM) [14] and its extensions [8,11,16,17,23], researchers have presented a class of novel models for regression, which determine indirectly the regressor through the nonparallel bound functions by solving two smaller-sized SVM-type problems. The importantly initial work is twin SVR (TSVR) proposed by Peng [22].

    View all citing articles on Scopus
    View full text