Large Margin Proximal Non-parallel Support Vector Classifiers

Liu, Mingzeng; Shao, Yuanhai

doi:10.1007/978-3-319-93713-7_69

Large Margin Proximal Non-parallel Support Vector Classifiers

Mingzeng Liu²⁰ &
Yuanhai Shao²¹

Conference paper
First Online: 12 June 2018

2929 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10862))

Abstract

In this paper, we propose a novel large margin proximal non-parallel twin support vector machine for binary classification. The significant advantages over twin support vector machine are that the structural risk minimization principle is implemented and by adopting uncommon constraint formulation for the primal problem, the proposed method avoids the computation of the large inverse matrices before training which is inevitable in the formulation of twin support vector machine. In addition, the dual coordinate descend algorithm is used to solve the optimization problems to accelerate the training efficiency. Experimental results exhibit the effectiveness and the classification accuracy of the proposed method.

Supported by the Hainan Provincial Natural Science Foundation of China (No. 118QN181), the Scientific Research Foundation of Hainan University, the National Natural Science Foundation of China (Nos. 11501310, 61703370, and 61603338).

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

As a powerfull tool in machine learning, support vector machines (SVMs) have been gained a great deal of attention in wide variety of fields [1,2,3,4,5]. The classical support vector classifier (SVC) is trying to maximize the margin between two parallel hyperplanes, which results in solving a convex quadratic programming problem (QPP). Furhtermore, some non-parallel hyperplane classifiers such as the generalized eigen-value proximal support vector machine (GEPSVM) and twin support vector machine (TWSVM) have been proposed in [4, 5]. TWSVM is to search two non-parallel proximal hyperplanes such that each hyperplane is closer to one class and as far as possible from the other one. In fact, as an efficient generalization of the classical SVC, TWSVMs have been studied extensively [6,7,8,9,10,11,12,13,14], which need to solve two small QPPs in contrast with the classical SVC. This paper proposes a novel large margin proximal non-parallel support vector machine for binary classification (PNSVM). The main contributions of PNSVM are:

PNSVM minimizes the structural risk by imposing a regularization term.
PNSVM is proximal both classes and maximizes the corresponding margin, while TWSVM is proximal each class and far away from the other class.
PNSVM can be solved efficiently with dual coordinate descend method [15].

2 Related Work

Benefiting from its excellent generalization performance of twin support vector machines, the approaches of constructing the non-parallel hyperplanes have received extensive attention [6, 8, 9, 11,12,13,14, 16]. Shao et al. [6] present a variant of GEPSVM based on the difference measure, which seems to be superior in classification accuracy and computation time. For the imbalanced data classification, Shao et al. [8] suggest an efficient weighted Lagrangian twin support vector machine (WLTSVM), which is robust to outliers and overcomes the bias phenomenon in the original TWSVM. Liu et al. [9] present a support vector machine for large scale regression based on the minimization of deviation distribution. Tian et al. [11] propose a novel nonparallel SVM for binary classification, which implements the structural risk minimization and is suitable for large scale problems. Shao et al. [16] present a sparse $L_q$-norm least squares support vector machine, where feature selection and prediction are performed simultaneously.

3 PNSVM

3.1 Linear PNSVM

Linear PNSVM is to find two non-parallel hyperplanes $f_1(\varvec{x})=\varvec{w}_1^T\varvec{x}+b_1=0$ and $f_2(\varvec{x})=\varvec{w}_2^T\varvec{x}+b_2=0$ by solving the following two problems:

(1)

and

(2)

where $c_i, i=1,2,\cdots ,6$ are parameters.

The formulation given at (1) can be understood as follows: The first term in (1) is the sum of squared distances from $f(\varvec{x})=\varvec{w}_1\varvec{x}+b_1=0$ to points of positive class, whose minimization means to keep $f(\varvec{x})=\varvec{w}_1\varvec{x}+b_1=0$ close to the positive class. The minimization of the second term in (1) implies that the structural risk principle is implemented by the term $\frac{1}{2}c_2(\Vert \varvec{w}_1^2+b_1^2\Vert )$. The second constraints in (1) require the hyperplane $f(\varvec{x})=\varvec{w}_1^T\varvec{x}+b_1$ to be at a distance of at least 1 and at most $\rho _1$ from points of negative class. The third term in (1) tries to minimize mis-classification. The last term in (1) requires the points of negative class to be a distance no far away from the hyperplane. A similar interpretation may also be given to the formulation given at (2).

The Lagrangian of the formulation (1) is given by

$$\begin{aligned}&\mathbf {L}(\varvec{w}_1,b_1,\varvec{p},\varvec{q}_1,\rho _1;\varvec{\alpha }_1,\varvec{\alpha }_2,\varvec{\alpha }_3,\varvec{\alpha }_4,\beta ) \\= & {} \frac{1}{2}c_1 (\Vert \varvec{w}_1\Vert ^2+b_1^2)+\frac{1}{2}\varvec{p}^T\varvec{p}+c_2\varvec{e}_2^T\varvec{q}_1+c_3\rho _1 +\varvec{\alpha }_1^T(\varvec{Aw}_1+\varvec{e}_1b_1-\varvec{p})\nonumber \\- & {} \varvec{\alpha }_2^T(-(\varvec{Bw}_1+\varvec{e}_2b_1)+\varvec{q}_1-\varvec{e}_2)-\varvec{\alpha }_3^T(\rho _1 \varvec{e}_2 + (\varvec{Bw}_1+\varvec{e}_2b_1)-\varvec{q}_1)\nonumber \\- & {} \varvec{\alpha }_4^T\varvec{q}_1-\beta (\rho _1-1) \end{aligned}$$

where $\varvec{\alpha _1}\in \mathbf {R}^{m_1\times 1}, \varvec{\alpha }_2\in \mathbf {R}^{m_2\times 1},\varvec{\alpha }_3\in \mathbf {R}^{m_2\times 1},\varvec{\alpha }_4\in \mathbf {R}^{m_1\times 1},\beta \in \mathbf {R}$ are the vectors of Lagrange multipliers. The optimality conditions for $\varvec{w}_1,b_1,\varvec{p},\varvec{q}_1,\rho _1$ are given by

$$\begin{aligned} \nabla _{\varvec{w}_1}\mathbf {L}= & {} c_1\varvec{w}_1+\varvec{A}^T\varvec{\alpha }_1+\varvec{B}^T\varvec{\alpha }_2 - \varvec{B}^T\varvec{\alpha }_3 =0 \end{aligned}$$

(3)

$$\begin{aligned} \nabla _{b_1}\mathbf {L}= & {} c_1b_1+\varvec{e}_1^T\varvec{\alpha }_1+\varvec{e}_1^T\varvec{\alpha }_2-\varvec{e}_2^T\varvec{\alpha }_3=0\end{aligned}$$

(4)

$$\begin{aligned} \nabla _{\varvec{p}}\mathbf {L}= & {} \varvec{\alpha }_1-\varvec{p}=0\end{aligned}$$

(5)

$$\begin{aligned} \nabla _{\varvec{q}_1}\mathbf {L}= & {} c_2\varvec{e}_2-\varvec{\alpha }_2+\varvec{\alpha }_3-\varvec{\alpha }_4=0\end{aligned}$$

(6)

$$\begin{aligned} \nabla _{\rho _1}\mathbf {L}= & {} c_3-\varvec{e}_2^T \varvec{\alpha }_3 -\beta =0 \end{aligned}$$

(7)

$$\begin{aligned}&\varvec{\alpha }_2\ge 0,\varvec{\alpha }_3\ge 0,\varvec{\alpha }_4\ge 0, \beta \ge 0. \end{aligned}$$

(8)

Then substituting (3) and (7) into the Lagrangian, we obtain the dual problem of the problem

(9)

where

$$\begin{aligned} \varvec{\bar{H}}= & {} \left[ \begin{array}{c c c} \varvec{AA}^T+c_1\varvec{I}&{}\varvec{AB}^T&{}-\varvec{AB}^T\\ \varvec{BA}^T&{}\varvec{BB}^T&{}-\varvec{BB}^T\\ -\varvec{BA}^T&{}-\varvec{BB}^T&{}\varvec{BB}^T \end{array} \right] + \left[ \begin{array}{c c c} \varvec{e}_1\varvec{e}_1^T&{}\varvec{e}_1\varvec{e}_2^T&{}-\varvec{e}_1\varvec{e}_2^T\\ \varvec{e}_2\varvec{e}_1^T&{}\varvec{e}_2\varvec{e}_2^T&{}-\varvec{e}_2\varvec{e}_2^T\\ -\varvec{e}_2\varvec{e}_1^T&{}-\varvec{e}_2\varvec{e}_2^T&{}\varvec{e}_2\varvec{e}_2^T \end{array} \right] . \end{aligned}$$

The dual of the problem (2) is

(10)

where

$$\begin{aligned} \varvec{\tilde{H}}= & {} \left[ \begin{array}{c c c} \varvec{BB}^T+c_4\varvec{I}&{}-\varvec{BA}^T&{}\varvec{BA}^T\\ -\varvec{AB}^T&{}\varvec{AA}^T&{}-\varvec{AA}^T\\ \varvec{AB}^T&{}-\varvec{AA}^T&{}\varvec{AA}^T \end{array} \right] + \left[ \begin{array}{c c c} \varvec{e}_2\varvec{e}_2^T&{}-\varvec{e}_2\varvec{e}_1^T&{}\varvec{e}_2\varvec{e}_1^T\\ -\varvec{e}_1\varvec{e}_2^T&{}\varvec{e}_1\varvec{e}_1^T&{}-\varvec{e}_1\varvec{e}_1^T\\ \varvec{e}_1\varvec{e}_2^T&{}-\varvec{e}_1\varvec{e}_1^T&{}\varvec{e}_1\varvec{e}_1^T \end{array} \right] . \end{aligned}$$

Once the $(\varvec{w}_1,b_1)$ and $(\varvec{w}_2,b_2)$ are obtained from the (9) and (10) by means of the dual coordinate descent method [15], a new input $\varvec{x}\in \mathbf {R}^n$ is assigned to class $i~(i=+1,-1)$ by ${Class~i}=arg \min \limits _{k=1,2} \frac{|\varvec{w}_k^T\varvec{x}+b_k|}{\Vert \varvec{w}_k\Vert }.$

3.2 Kernel PNSVM

Kernel PNSVM searches the following two kernel-generated surfaces instead of hyperplanes $K(\varvec{x}^T, \varvec{C}^T)\varvec{w}_1+b_1 = 0$ and $K(\varvec{x}^T, \varvec{C}^T)\varvec{w}_2+b_2 = 0,$ where $\varvec{C}^T=[\varvec{A}~\varvec{B}]^T\in \mathbf {R}^{n\times \ell }$. The optimization problems are

(11)

and

(12)

where $c_i, i=1,\cdots ,6$ are parameters.

Table 1. The comparison results of linear classifiers.

Full size table

4 Experimental Results

In this section, the UCI data sets are chosen to demonstrate the performance of our PNSVM compared with SVC, TWSVM and WLTSVM [13]. The methods are implemented by Matlab 9.0 running on a PC with an Intel(R) Core Duo i7(2.70 GHZ) with 32 GB RAM. Our PNSVM is solved by the dual coordinate descend algorithm and SVC and TWSVM are solved by the optimization toolbox QP in Matlab. The classification accuracy is measured by the standard 10-fold cross-validation.

The classification accuracy, computation time, and optimal values of $c_1(=c_2),c_4=(c_5)$ and $c_3$ and $c_6$ in our PNSVM are listed in Table 1. The parameters from PNSVM, TWSVM, SVC and WLTSVM are searched in the range $\{2^{2i}|i=-8,\cdots ,8\}$, and the parameters $c_3$ and $c_6$ of PNSVM are selected from the set $\{0.01,0.1,1,10,100\}$. Table 1 shows the comparison results of all four methods. It can be seen from Table 1 that the accuracy of the proposed linear PNSVM is significantly better than that of the linear TWSVM on most of data sets. And our PNSVM is the fastest on most of data sets. Figure 1 exhibits the influence on Accuracy of parameters $c_3$ and $c_6$ for linear PNSVM. It can be observed that the choice of $c_3$ and $c_6$ affects the results dramatically, which implies that adjusting the parameters $c_3$ and $c_6$ is practical selection in real applications. Table 2 is concerned with our kernel PNSVM, TWSVM ans SVC and WLTSVM. The Gaussian kernel $K(\varvec{x},\varvec{x}')=e^{-\mu \Vert \varvec{x}-\varvec{x}'\Vert ^2}$ is used. The kernel parameter $\mu $ is also searched from the sets $\{2^i|i=-8,\cdots ,8\}$. The classification accuracy and computation times for all three methods are also listed in Table 2.

Table 2. The comparison results of kernel classifiers.

Full size table

5 Conclusions

For binary classification problem, a novel large margin proximal non-parallel twin support vector machine was proposed in this paper. The main contribution are that the structural risk minimization principle is implemented by introducing a regularization term in the primal problems of our PNSVM and the dual formulation of PNSVM avoids the computation of inverse matrices and speeds up the training efficiency. Experimental results of our PNSVM and SVC and TWSVM and WLTSVM, have been made on several data sets, implying that the proposed method is not only faster but also exhibits better generalization. It should be pointed out that there are six parameters in our PNSVM, so the parameter selection and more efficient algorithm [17] is a practical problem and should be addressed in the future.

References

Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Deng, N.Y., Tian, Y.J., Zhang, C.H.: Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions. CRC Press, New York (2012)
MATH Google Scholar
Noble, W.S.: Support vector machine applications in computational biology. In: Kernel Methods in Computational Biology, pp. 71–92 (2004)
Google Scholar
Mangasarian, O.L., Wild, E.W.: Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 69–74 (2006)
Article Google Scholar
Jayadeva, R.K., Khemchandani, R., Chandra, S.: Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 905–910 (2007)
Article Google Scholar
Shao, Y.H., Zhang, C.H., Wang, X.B., Deng, N.Y.: Improvements on twin support vector machines. IEEE Trans. Neural Netw. 22(6), 962–968 (2011)
Article Google Scholar
Shao, Y.H., Wang, Z., Chen, W.J., Deng, N.Y.: A regularization for the projection twin support vector machine. Knowl. Based Syst. 37, 203–210 (2013)
Article Google Scholar
Shao, Y.H., Chen, W.J., Zhang, J.J., Wang, Z., Deng, N.Y.: An efficient weighted Largrangian twin support vector machine for imbalanced data classification. Pattern Recogn. 47(9), 3158–3167 (2014)
Article Google Scholar
Liu, M.Z., Shao, Y.H., Wang, Z., Li, C.N., Chen, W.J.: Minimum deviation distribution machine for large scale regression. Knowl. Based Syst. 146, 167–180 (2018)
Article Google Scholar
Qi, Z.Q., Tian, Y.J., Shi, Y.: Robust twin support vector machine for pattern classification. Patterm Recogn. 46(1), 305–316 (2013)
Article Google Scholar
Tian, Y.J., Qi, Z.Q., Ju, X.C., Shi, Y., Liu, X.H.: Nonparallel support vector machines for pattern classification. IEEE Trans. Cybern. 44(7), 1067–1079 (2014)
Article Google Scholar
Tian, Y.J., Ping, Y.: Large-scale linear nonparallel support vector machine solver. Neural Netw. 50, 166–174 (2014)
Article Google Scholar
Shao, Y.H., Chen, W.J., Wang, Z., Li, C.N., Deng, N.Y.: Weighted linear loss twin support vector machine for large-scale classification. Knowl. Based Syst. 73, 276–288 (2015)
Article Google Scholar
Shao, Y.H., Chen, W.J., Deng, N.Y.: Nonparallel hyperplane support vector machine for binary classification problems. Inf. Sci. 263, 22–35 (2014)
Article MathSciNet Google Scholar
Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, pp. 408–415. ACM (2008)
Google Scholar
Shao, Y.H., Li, C.N., Liu, M.Z., Wang, Z., Deng, N.Y.: Sparse $L_q$-norm least square support vector machine with feature selection. Pattern Recogn. 78, 167–181 (2018)
Article Google Scholar
Wang, Z., Shao, Y.H., Bai, L., Liu, L.M., Deng, N.Y.: Stochastic gradient twin support vector machine for large scale problems. arXiv preprint arXiv:1704.05596

Download references

Author information

Authors and Affiliations

School of Mathematics and Physics Science, Dalian University of Technology at Panjin, Dalian, 124221, People’s Republic of China
Mingzeng Liu
School of Economics and Management, Hainan University, Haikou, 570228, People’s Republic of China
Yuanhai Shao

Authors

Mingzeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanhai Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuanhai Shao .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Yong Shi
National Supercomputing Center in Wuxi, Wuxi, China
Haohuan Fu
Chinese Academy of Sciences, Beijing, China
Yingjie Tian
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Amsterdam, Amsterdam, The Netherlands
Michael Harold Lees
University of Tennessee, Knoxville, Tennessee, USA
Jack Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, M., Shao, Y. (2018). Large Margin Proximal Non-parallel Support Vector Classifiers. In: Shi, Y., et al. Computational Science – ICCS 2018. ICCS 2018. Lecture Notes in Computer Science(), vol 10862. Springer, Cham. https://doi.org/10.1007/978-3-319-93713-7_69

Download citation

DOI: https://doi.org/10.1007/978-3-319-93713-7_69
Published: 12 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93712-0
Online ISBN: 978-3-319-93713-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics