Nonparallel plane proximal classifier

doi:10.1016/j.sigpro.2008.10.002

Signal Processing

Volume 89, Issue 4, April 2009, Pages 510-522

https://doi.org/10.1016/j.sigpro.2008.10.002 Get rights and content

Abstract

We observed that the two costly optimization problems of twin support vector machine (TWSVM) classifier can be avoided by introducing a technique as used in proximal support vector machine (PSVM) classifier. With this modus operandi we formulate a much simpler nonparallel plane proximal classifier (NPPC) for speeding up the training of it by reducing significant computational burden over TWSVM. The formulation of NPPC for binary data classification is based on two identical mean square error (MSE) optimization problems which lead to solving two small systems of linear equations in input space. Thus it eliminates the need of any specialized software for solving the quadratic programming problems (QPPs). The formulation is also extended for nonlinear kernel classifier. Our computations show that a MATLAB implementation of NPPC can be trained with a data set of 3 million points with 10 attributes in less than 3 s. Computational results on synthetic as well as on several bench mark data sets indicate the advantages of the proposed classifier in both computational time and test accuracy. The experimental results also indicate that performances of classifiers obtained by MSE approach are sufficient in many cases than the classifiers obtained by standard SVM approach.

Introduction

Support vector machine (SVM) algorithm is an excellent tool for binary data classification [1], [2], [3], [4]. This learning strategy introduced by Vapnik and co-worker [1] is a principled and very powerful method in machine learning algorithm. Within a few years after its introduction SVM has already outperformed most other systems in a wide variety of applications. These include a wide spectrum of research areas, ranging from pattern recognition [5], text categorization [6], biomedicine [7], [8], brain–computer interface [9], [10], and financial applications [11], [12], etc.

The theory of SVM, proposed by Vapnik, is based on the idea of structural risk minimization (SRM) principle [1], [2], [3]. In its simplest form, SVM for a linearly separable two class problem finds an optimal hyper plane that maximizes the separation between the two classes. The hyper plane is obtained by solving a quadratic optimization problem. For nonlinearly separable cases the input feature vectors are first mapped into a high dimensional feature space by using a nonlinear kernel function [4], [13], [14]. A linear classifier is then implemented in that feature space to classify the data. One of the main challenges of standard SVM is that it requires large training time for huge database as it has to optimize a computationally expensive cost function. The performance of a trained SVM classifier also depends on the optimal parameter set which is usually found by cross-validation on a tuning set [15]. The large training time of SVM also prevents one to locate optimal parameter set from a very fine grid of parameters over large span. To remove this drawback, various versions of SVM have been reported by many researchers with comparable classifications ability. Introduction of proximal type of SVMs [16], [17], [18] eradicate the above shortcoming of standard SVM classifier. These classifiers avoid the costly optimization problem of SVM and as a result they are very fast. Such formulations of SVM can be interpreted as regularized least squares and considered in the much more general context of regularized networks [19], [20].

All the above classifiers discriminate a pattern by determining in which half space it lies. Mangasarian and Wild [21] first proposed a classification method by the proximity of patterns to one of the two nonparallel planes. They named it as the generalized eigenvalue proximal support vector machine (GEPSVM) classifier. Instead of finding a single hyperplane, GEPSVM finds two nonparallel hyperplanes such that each plane is clustered around one particular class data. For this GEPSVM solves two allied generalized eigenvalue problems. Although this approach is called a SVM but it is more likely to discriminate patterns by using fisher information criterion [13], [15]. Because by changing the two class margin representation by “parallel” to “nonparallel” hyperplanes it switches from a binary to potentially many class approach. The linear kernel GEPSVM is very fast as it solves two generalized eigenvalue problems of the order of input space dimension. But performance of it is only comparable with standard SVM and in many cases it gives low classification rates. Recently, Jayadeva et al. [22] proposed twin support vector machine (TWSVM) classifier. In TWSVM also two nonparallel planes are generated similar to GEPSVM but in a different technique. For this purpose it solves two smaller sized quadratic programming problems (QPPs) instead of solving large one as in the standard SVM [2], [3], [4]. Although TWSVM and GEPSVM classify data by two nonparallel planes yet the former is more likely to a typical SVM problem which does not eliminate the basic assumption of selecting a minimum number of support vectors [23]. Although TWSVM achieves good classification accuracy but it is not desirable to solve two optimization problems in many cases, predominantly for large data sets due to higher learning time. This fact motivates us to formulate the proposed classifier such that it has good classification ability as TWSVM [22] and at the same time it should computationally efficient as PSVM [18] or linear GEPSVM [21].

In this paper, we recommend binary data classifier, named as nonparallel plane proximal classifier (NPPC). NPPC also classifies binary data by the proximity of it to one of two nonparallel planes. The formulation of NPPC is totally different from that of GEPSVM [21]. But the formulations of the objective functions of NPPC are similar to that of TWSVM [22] with a different loss function and equality constraints instead of inequality constraints. We call this formulation nonparallel proximal plane classifier (NPPC) rather than a SVM classifier as there is no SRM by margin maximization between the two classes like standard SVM. Thus it can be interpreted as a classifier obtained by regularized mean square error (MSE) optimization. At last the most important fact is that the computational results on several data sets show that the performance of such classifiers obtained by MSE optimization is comparable or even better than the SVM classifiers and eliminates the need of computational costly SVM classifier in many cases.

The rest of this paper is organized as follows. A brief introduction of all the SVM classifiers is given in Section 2. In Section 3, we have formulated NPPC for linear kernel with two visual examples in two dimensions and in Section 4; we have extended the formulation for the nonlinear kernels and demonstrated its performance visually by one example. In Section 5 performances of our proposed NPPC is compared with other SVM classifiers for linear and nonlinear kernels. Finally Section 6 concludes the paper.

A concise utterance regarding the notations used in this paper [21] is as follows. All vectors are considered as column vectors if not they are transposed by using a superscript T. Inner product of two vectors x and y in n-dimensional real space $ℜ^{n}$ is denoted by x^Ty and the two-norm of x is indicated by ||x||. The vector e represents column vector of ones of proper dimension whereas I stands for identity matrix of subjective dimension. In case of a matrix, containing feature vectors, $A \in ℜ^{m \times n}$ the ith row A_i is a row vector in $ℜ^{n}$ . Matrices A and B contain the feature vectors of classes +1 and −1, respectively. For $A \in ℜ^{m_{1} \times n}$ and $C \in ℜ^{n \times m}$ , a kernel K(A,C) maps $ℜ^{m_{1} \times n} \times ℜ^{n \times m}$ into $ℜ^{m_{1} \times m}$ . Only the symmetric property of the kernel is assumed [21] without any use of Mercer's positive definiteness condition [2], [3], [4], [13], [14]. The ijth element of the assumed Gaussian kernel [2] for testing nonlinear classification is given by $(K (A, C))_{ij} = ε^{- μ | | A_{i}^{T} - C_{j} | |^{2}}$ , where i=1,…,m₁, j=1,…,m and μ is a positive constant, ε is the base of the natural logarithm, and A and C are as described above.

Section snippets

The linear SVM

SVM is a state-of-the-art of machine learning algorithm which is based on guaranteed risk bounds of statistical learning theory [1], [2] which is known as SRM principle. Among several tutorials on SVM literature we refer to [4].

Given m training pairs (x₁,y₁), …, (x_m, y_m), where $x_{i} \in ℜ^{n}$ is an input vector labeled by y_i∈{+1,−1} for i=1,…,m, the linear SVM classifier search for an optimal separating hyperplane $ω^{T} x + b = 0$ where $b \in ℜ$ is the bias term and $ω \in ℜ^{n}$ is the normal vector to the hyperplane.

The NPPC formulation

In this section, we elaborate the formulation of the classifier which we name as the nonparallel plane proximal classifier (NPPC). In the formulation of NPPC we have applied the concept of both TWSVM [22] and PSVM [18] with some modification to find two nonparallel planes. To obtain the two nonparallel planes as defined in (6), the linear NPPC (LNPPC) solves the following pair of QPPs: $(LNPPC 1) \underset{(ω_{1}, b_{1}, ξ_{2}) \in ℜ^{(n + 1 + m_{2})}}{Min} \frac{1}{2} {‖ A ω_{1} + e_{1} b_{1} ‖}^{2} + c_{1} e_{2}^{T} ξ_{2} + \frac{c_{2}}{2} ξ_{2}^{T} ξ_{2} s . t . - (B ω_{1} + e_{2} b_{1}) + ξ_{2} = e_{2}$ and $(LNPPC 2) \underset{(ω_{2}, b_{2}, ξ_{1}) \in ℜ^{(n + 1}}{Min}$

Nonlinear kernel nonparallel plane proximal classifier (NKNPPC) formulation

In this section we have extended our formulation to nonlinear classifiers by considering kernel generated surfaces instead of planes [18], [21], [22].

For nonlinearly separable case, the input data is first projected into a kernel generated feature space of same or higher dimension than that of the input space. To apply this transformation let K(.,.) is a nonlinear kernel function and define the augmented matrix $C = [\begin{matrix} A \\ B \end{matrix}] \in ℜ^{m \times n}$ where m₁+m₂=m, the total patterns in the training set. We now construct

Numerical testing and comparison

To compare the performance of our NPPC we investigate results in terms of accuracy and execution time on publicly available benchmark data sets from the UCI Repository [34] which are commonly used in testing machine learning algorithms. All the classification methods are implemented in MATLAB 7 [35] on Windows XP running on a PC with system configuration Intel P4 processor (3.06 GHz) with 1 GB of RAM. We compare both linear and nonlinear kernel classifiers using NPPC, TWSVM [22], GEPSVM [21],

Conclusion

In this paper we essentially put side by side the performance of classifiers obtained by margin maximization concept in MSE framework versus classifiers obtained by standard SVM approach. Based on the experimental results on benchmark data sets it conveys that SVM may be replaced by simpler optimization problems in several cases, in which we do not have to consider support vectors and inequality constraints. The computational results given in Table 1, Table 2 indicate that MSE optimization is

Acknowledgements

Authors would like to thank the referees for very useful comments and suggestions which greatly improved our representation. Authors are also grateful to Professor P. Mitra and Professor A. Routray of IIT Kharagpur for their help in presentation of the paper. Santanu Ghorai acknowledges the financial support of the authority of MCKV Institute of Engineering, Liluah, Howrah 711204, W.B., India, and All India Council of Technical Education (AICTE, India) in the form of salary and scholarship,

References (39)

K.S. Chua
Efficient computations for large least square support vector machine classifiers
Pattern Recognition Lett.
(2003)
C. Cortes et al.
Support vector networks
Machine Learning
(1995)
V. Vapnik
The Nature of Statistical Learning Theory
(1995)
N. Cristianini et al.
(2000)
C.J.C. Burges
A tutorial on support vector machines for pattern recognition
Data Mining Knowledge Discovery
(1998)
S. Lee, A. Verri, Pattern recognition with support vector machines, in: First International Workshop, SVM 2002,...
T. Joachims, C. Ndellec, C. Rouveriol, Text categorization with support vector machines: learning with many relevant...
D. Lin et al.
(2000)
W.S. Noble
Kernel Methods in Computational Biology, Support Vector Machine Applications in Computational Biology
(2004)
T. Ebrahimi et al.
Joint time-frequency-space classification of EEG in a brain–computer interface application
J. Appl. Signal Process.
(2003)

T.N. Lal et al.

Support vector channel selection in BCI

IEEE Trans. Biomed. Eng.

(2004)

H. Ince, T.B. Trafalis, Support vector machine for regression and applications to financial forecasting, in:...

C.J. Hsu et al.

Credit rating analysis with support vector machines and neural networks: a market comparative study

Decision Support Systems

(2004)

N. Cristianini et al.

Kernel Methods for Pattern Analysis

(2004)

T. Joachims

Making large-scale support vector machine learning practical

S. Haykin, Neural Networks—A Comprehensive Foundation, second ed., Pearson Education, 2006, Chapter 4, pp....

J.A.K. Suykens et al.

Least squares support vector machine classifiers

Neural Process. Lett.

(1999)

J.A.K. Suykens et al.

Least Squares Support Vector Machines

(2002)

G. Fung, O.L. Mangasarian, Proximal support vector machine classifiers, in: 7th International Proceedings on Knowledge...

Cited by (114)

Multi-view learning with privileged weighted twin support vector machine
2022, Expert Systems with Applications
By using inter-class and intra-class K-Nearest Neighbors (KNNs), weighted twin support vector machine (WLTSVM) mines as much potential similarity information in samples as possible to improve the common short-coming of nonparallel hyperplane classifiers. Multi-view learning (MVL) has a lot of potential due to the multi-modal datasets that are becoming available. In this paper, we propose a new multi-view learning with privileged weighted twin support vector machine (MPWTSVM). It not only inherits the advantages of WLTSVM but also has its characteristics. Firstly, it enhances generalization ability by mining intra-class information from the same perspective. Secondly, it reduces the redundant constraints with the help of inter-class information, thus improving the running speed. Most importantly, it can follow both the consensus and the complementary principles simultaneously. The consensus principle is realized by minimizing the coupling items of different views in the original objective function. The complementary principle is achieved by establishing privileged information paradigms and MVL. Compared with state-of-the-art MVL methods: SVM-2K, MVTSVM, MCPK, PSVM-2V, MVRDTSVM and MVTHSVM-2C, our model has better accuracy and classification efficiency. Experimental results on numerous datasets prove the effectiveness of the proposed algorithm.
L2-Loss nonparallel bounded support vector machine for robust classification and its DCD-type solver
2022, Applied Soft Computing
In this paper, we study a new classification methodology to enhance the robustness and convergence of the nonparallel support vector machine (NPSVM), namely L2-Loss nonparallel bounded SVM (L2-NPBSVM). We first define a L2-Loss and an adjustable hinge loss for NPSVM. Then, using the two loss functions, we propose the L2-NPBSVM algorithm. Both the L2-loss and the adjustable hinge loss are insensitive to the feature noise around the decision boundary. In addition, the L2-Loss can make full use of the margin distribution information in the training samples. The margin distribution is more important than margin maximization for generalization performance. Furthermore, an additional regularization term is added into the objective functions of L2-NPBSVM. It can ensure the global solution and stability of optimization problems. Thus, the classification performance of L2-NPBSVM can be further improved. In addition, in order to shorten the training time, the dual coordinate descent (DCD) algorithm is described and analyzed to optimize L2-NPBSVM. Numerical experiments on different datasets demonstrate that our L2-NPBSVM has the advantages of strong robustness, strong generalization ability and fast convergence.
Robust supervised and semi-supervised twin extreme learning machines for pattern classification
2021, Signal Processing
In this paper, we first propose a novel robust loss function called adaptive capped $L_{θ ε}$ -loss. The $L_{θ ε}$ -loss has some interesting properties, such as robustness, non-convexity, and boundedness. During the learning process, for different problems, we can choose different loss functions through adaptive parameter $θ$ . Then, a new robust twin extreme learning machine (RTELM) framework is presented by applying $L_{θ ε}$ -loss and capped $L_{1}$ -norm distance metric. Compared with the twin extreme learning machine (TELM), RTELM overcomes the disadvantages of $L_{2}$ -norm distance metric and hinge loss, especially for the problem with outliers, while inherits the advantages of TELM. Further, we present a new Laplacian RTELM (Lap-RTELM for short) by introducing manifold regularization terms into RTELM. Intuitively, the Lap-RTELM can effectively utilize geometric information embedded in unlabeled samples and merge them as manifold regularization terms to learn a more reasonable classifier for semi-supervised classification (SSC) problems. Finally, two effective iterative algorithms are designed to solve the challenges brought by the non-convex optimization problems RTELM and Lap-RTELM, and theoretically guarantee the convergence, local optimality, and computational complexity of algorithms. Experiments on multiple datasets show that the proposed RTELM and Lap-RTELM are competitive with existing methods.
Discriminative information-based nonparallel support vector machine
2019, Signal Processing
Twin support vector machine (TSVM), as a typical nonparallel support vector machine, has been demonstrated to be effective in terms of classification performance. However, the existing TSVM does not take the within-class and between-class constraint projections into account. Inspired by modified pairwise constraint trick, we propose a novel classifier termed discriminative information-based nonparallel support vector machine (DINPSVM) to improve the performance of TSVM by introducing two novel regularization terms for each hyperplane, which takes the tightness between the similar patterns and discrepancy between the dissimilar pairs into consideration. The new classifier can not only learn the prior discriminative information about each constrained pair, but also combine the discrimination metric and spatial distance measure together. Experimental results on an artificial and twenty-three UCI datasets verify the efficiency and advantage of the proposed DINPSVM.
An improved non-parallel Universum support vector machine and its safe sample screening rule
2019, Knowledge-Based Systems
A novel non-parallel hyperplane Universum support vector machine ( $U$ -NHSVM) is proposed in this paper. Universum data with ensconced prior knowledge are exploited by a non-parallel hyperplane support vector machine. In contrast to other algorithms, the proposed $U$ -NHSVM shows flexibility by exploiting the prior knowledge ensconced in Universum and provides consistency by constructing two non-parallel hyperplanes simultaneously. With Universum, $U$ -NHSVM is clearly effective but also time consuming. Therefore, a safe sample screening rule (SSSR) for $U$ -NHSVM is also proposed based on its sparsity, termed SSSR- $U$ -NHSVM. Because only the non-SVs are excluded from both labelled and Universum samples, the efficiency of SSSR- $U$ -NHSVM is extremely improved while the accuracy is completely conserved. Numerical experiments on seventeen benchmark datasets and a Chinese wine dataset are carried out to demonstrate the validity of the proposed $U$ -NHSVM and SSSR- $U$ -NHSVM.
PTSVRs: Regression models via projection twin support vector machine
2018, Information Sciences
Citation Excerpt :
For example, the parametric-insensitive ν-support vector regression (par-ν-SVR) [12] simultaneously finds the decision regressor and a pair of nonparallel parametric-insensitive bounds by solving a SVR-type problem. In the spirit of the twin support vector machine (TWSVM) [14] and its extensions [8,11,16,17,23], researchers have presented a class of novel models for regression, which determine indirectly the regressor through the nonparallel bound functions by solving two smaller-sized SVM-type problems. The importantly initial work is twin SVR (TSVR) proposed by Peng [22].
Taking motivation from projection twin support vector machine (PTSVM) formulation for recognition, this paper proposes two novel projection twin support vector regression (PTSVR) models, called pair-shifted PTSVR (PPTSVR) and single-shifted PTSVR (SPTSVR), respectively. PTSVRs construct indirectly the target regressor by two functions (hyperplanes) obtained from two smaller-sized quadratic programming problems (QPPs), in which each normal direction makes the within-class variance of the projection of shifted set (or original set) be minimized and the projected center be at a distance of at least 1 from the projection of the other shifted set. As other twin support vector machine (TWSVM) models, the learning speed of PTSVRs is faster than classical support vector regression (SVR) since each of their QPP has only half size. Experimental results on several synthetic as well as benchmark datasets indicate the significant advantage of PPTSVR and SPTSVR in the generalization performance.

View all citing articles on Scopus

View full text

Nonparallel plane proximal classifier

Abstract

Introduction

Section snippets

The linear SVM

The NPPC formulation

Nonlinear kernel nonparallel plane proximal classifier (NKNPPC) formulation

Numerical testing and comparison

Conclusion

Acknowledgements

Pattern Recognition Lett.

Support vector networks

Machine Learning

The Nature of Statistical Learning Theory

A tutorial on support vector machines for pattern recognition

Data Mining Knowledge Discovery

Kernel Methods in Computational Biology, Support Vector Machine Applications in Computational Biology

Joint time-frequency-space classification of EEG in a brain–computer interface application

J. Appl. Signal Process.

Support vector channel selection in BCI

IEEE Trans. Biomed. Eng.

Credit rating analysis with support vector machines and neural networks: a market comparative study

Decision Support Systems

Kernel Methods for Pattern Analysis

Making large-scale support vector machine learning practical

Least squares support vector machine classifiers

Neural Process. Lett.

Least Squares Support Vector Machines