A fast approximate algorithm for training L1-SVMs in primal space

doi:10.1016/j.neucom.2006.11.003

Neurocomputing

Volume 70, Issues 7–9, March 2007, Pages 1554-1560

https://doi.org/10.1016/j.neucom.2006.11.003 Get rights and content

Abstract

We propose a novel and fast algorithm to train support vector machines (SVMs) in primal space, which solves an approximate optimization of SVMs with the properties of unconstraint, continuity and twice differentiability by utilizing the Newton optimization technique. Further, we devise a special pre-extracting procedure to speed up the convergence of the algorithm by resorting to a high-quality initial solution. Theoretical studies show that the proposed algorithm produces an $ɛ$ -approximate solution to standard SVMs and maintains low computational complexity. Experimental results on benchmark data sets demonstrate that our algorithm is much faster than the dual based method such as ${SVM}^{light}$ while it achieves the similar test accuracy.

Introduction

Support vector machines (SVMs) are modern learning systems that deliver state-of-the-art performance in real world pattern recognition applications, such as text categorization, hand-written character recognition etc. The principle of SVMs is the learning theory developed by Vapnik [2], [10]. The literature for solving SVMs mainly concentrates on its dual optimization, such as ${SVM}^{light}$ [4], SMO [9] etc., whose core is a simply constrained convex quadratic programming (QP) problem. However, there are computational and storage limitations when seeking an exact solution to SVMs in dual space for large-scale classification tasks. An alternative method is to learn an approximate solution to SVMs quickly in primal space, as suggested by Keerthi et al. [5] and Chapelle [3].

Recently, several approaches handling the primal optimization of SVMs have been developed. Mangasarian [7] defined a generalized Hessian and transformed the original primal problem into unconstrained minimization of a strongly convex, piecewise quadratic objective function. Keerthi et al. [5] developed a fast method for solving linear SVMs with $L_{2}$ loss function, which was much faster than dual methods and suitable for large-scale data mining tasks. Chapelle [3] has recently furthered their studies to the non-linear case and devised a promising Newton-primal algorithm to solve it. That algorithm, however, didn’t show training speed superior to dual methods despite its feasibility. Additional deficiency of Chapelle's method lay in its inability to guarantee the accuracy of its approximate solution.

In this paper, we propose a fast algorithm for training $L_{1}$ -SVMs in primal space, which reaches an $ɛ$ -approximate solution with guaranteed accuracy. We also present a special pre-extracting procedure to further accelerate its convergence. Theoretical analyses show that our algorithm obtains the same computational complexity as the low-bound for solving SVMs. Experimental results indicate that the proposed algorithm is very competitive for training SVMs, which learns classifiers more quickly and cheaply and maintains comparable accuracy to ${SVM}^{light}$ .

The paper is organized as follows: Section 2 deduces an approximate primal optimization of SVMs, which is unconstrained and twice differentiable. Section 3 proposes a Newton-type algorithm for solving it, and then analyze the costs of the algorithm as well as the properties of the solution. Section 4 presents an experimental evaluation on real benchmark data sets that demonstrates the efficiency of our method. Section 5 contains some concluding remarks.

Section snippets

Training $L_{1}$ -SVMs in the primal space

Consider a binary classification problem with training samples ${x_{i}, y_{i}}_{i = 1}^{n}$ , where $x_{i} \in R^{d}$ and $y_{i} \in {+ 1,- 1}$ . The separating hyperplane of non-linear $L_{1}$ -SVMs is determined by solving the following primal optimization [2], [10]: $\min_{w, ξ, b} g (w, b) = \frac{1}{2} w^{T} w + C \sum_{i = 1}^{n} ξ_{i} s . t . y_{i} [w^{T} φ (x_{i}) + b] ⩾ 1 - ξ_{i}, ξ_{i} ⩾ 0 .$ where $φ (\cdot) : R^{d} \mapsto R^{N}$ is the reproducing kernel map determined by kernel function $k (x_{i}, x_{j}) = φ (x_{i}) \cdot φ (x_{j})$ and $φ (x_{i})$ is the image of $x_{i}$ in the feature space $R^{N}$ . Using the representer theorem [6], we represent $w$ in the reproducing

The proposed PN-pe algorithm

Based on the above analyses, we present a Newton-type algorithm for solving (5), which is called “primal Newton algorithm with pre-extracting” (or PN-pe).

Algorithm 1

PN-pe:

Input: Training samples ${x_{i}, y_{i}}_{i = 1}^{n}$ , kernel function $k (x_{i}, x_{j})$ , penalty parameter $λ$ , Huber parameter $γ > 0$ and a small real $υ > 0$ .

(a)
Initialization: Run the Pre-exctrating procedure to initialize ${USVs}^{0}$ and ${BSVs}^{0}$ . Calculate the kernel submatrices $K_{u}^{0}, K_{b}^{0}, K_{u, b}^{0}, K_{u, \bar{s}}^{0}$ and $K_{b, \bar{s}}^{0}$ . Let $k = 0$ .
(b)
$k = k + 1$ , update $z^{k}$ according to (11) and calculate $r^{k} = e - {DA}^{T} z^{k}$ .

Experimental evaluation

We evaluate the performance of PN-pe algorithm on a wide range of benchmark data sets. A full description of them is listed in Table 1, where the shuttle and vehicle data sets are from LIBSVM data¹ and the rest from UCI repository.²

The first experiment compares the performance of the PN-pe algorithm against the famous dual-based training algorithm, ${SVM}^{light}$ [4]. The

Conclusions

The main contribution of this paper is that a novel and efficient primal algorithm for training $L_{1}$ -SVMs has been proposed, which yields an $ɛ$ -approximate solution quickly with guaranteed accuracy. In order to speed up the proposed algorithm, we further devise a pre-extracting procedure to accelerate its convergence by providing it with a high-quality initial solution. Experimental results on real benchmark data sets demonstrate that the proposed algorithm is competitive for training $L_{1}$ -SVMs in

Acknowledgments

The authors wish to thank the reviewers and the editors for their comments and suggestions, which have helped to improve the manuscript considerably.

Lei Wang received his B.S. and M.S. degrees in Computer Science from Southwest Jiaotong University, China, in 2000 and 2003, respectively. He is currently pursuing a Ph.D. degree at the School of Computer Science and Engineering, University of Electronic Science and Technology of China. His research interests include pattern recognition, data mining and support vector machines.

References (12)

A. Bordes et al.
Fast kernel classifiers with online and active learning
J. Mach. Learn. Res.
(2005)
C. Burgers
A tutorial on support vector machines for pattern recognition
Data Min. Knowl. Discovery
(1998)
O. Chapelle, Training a support vector machine in the primal, Technical report, Max Planck Institute, 2006...
T. Joachims
Making large-scale SVM learning practical
S.S. Keerthi et al.
A modified finite Newton method for fast solution of large scale linear SVMs
J. Mach. Learn. Res.
(2005)
G.S. Kimeldorf et al.
A correspondence between Bayesian estimation on stochastic processes and smoothing by splines
Ann. Math. Stat.
(1970)

There are more references available in the full text version of this article.

Cited by (5)

Huberized one-class support vector machine with truncated loss function in the primal space
2022, Advances in Engineering Software
One-class support vector machine (OCSVM) is an important tool in machine learning and has been extensively used for one-class classification problems. The traditional OCSVM solves the primal problem by solving the dual problem, which is a quadratic programming problem. However, the computation of the quadratic programming is cubic and the storage complexity is quadratic with problem scale, so it is inefficient for training large-scale problems. In this paper, we propose to train OCSVM in primal space directly. Unfortunately, owing to the non-differentiability of hinge loss used in OCSVM, the OCSVM cannot be solved by the gradient-based optimization method which is first-order method that converges fast. On the other hand, the hinge loss is unbounded which makes the OCSVM less robust to outliers. The outliers will make the decision boundary severely deviate from the optimal hyperplane. To overcome the drawbacks, a huberized truncated loss function which is a nonconvex differentiable function is proposed to improve the robustness of the OCSVM. The huberized truncated loss function is insensitive to outliers as a substitute for hinge loss in traditional OCSVM. In contrast to traditional OCSVM, the primal objective function of robust OCSVM is differentiable. Considering the non-convexity of the optimization problem, we employ an accelerated proximal gradient algorithm to solve the robust OCSVM in the primal space. The numerical experiments on benchmark datasets and handwritten digit datasets show that the proposed method not only improves the robustness of the OCSVM , but also can reduce the computational complexity.
Emotional wellbeing in intercity travel: Factors affecting passengers' long-distance travel moods
2022, Frontiers in Public Health
Fast Support Vector Machines for Continuous Data
2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Sparse support vector machines by kernel discriminant analysis
2009, ESANN 2009 Proceedings, 17th European Symposium on Artificial Neural Networks - Advances in Computational Intelligence and Learning
Training one-class support vector machines In the primal space
2009, Proceedings - 2009 International Conference on Electronic Computer Technology, ICECT 2009

Shixin Sun was born in 1940 in Hubei Province, China. He graduated from the college of Mathematics, Sichuan University, China, in 1966. From 1984 to 1987, he was a Visiting Scholar and Research Fellow in the Institute of Computation and Applied Mathematics, University of Josephe-Fourrier Grenoble 1, France. During 1990–99, he worked at several universities in France, Italy and German. Currently, he is a professor in the School of Computer Science and Engineering, University of Electronic Science and Technology of China.

Kai Zhang received his B.S. degree in Information Engineering from Nanjing University of Science and Technology, China, in 1999, and his M.S. degree in Computer Science from the International University Bremen, Germany, in 2006. He is currently pursuing a Ph.D. degree at the Center for Intelligent Systems, Control and Robotics (CISCOR), Florida State University. His research interests include robot learning, human-robot interaction, multi-vehicle coordination and motion planning.

View full text

LettersA fast approximate algorithm for training L1-SVMs in primal space

Abstract

Introduction

Section snippets

Training L1-SVMs in the primal space

The proposed PN-pe algorithm

Experimental evaluation

Conclusions

Acknowledgments

Fast kernel classifiers with online and active learning

J. Mach. Learn. Res.

A tutorial on support vector machines for pattern recognition

Data Min. Knowl. Discovery

Making large-scale SVM learning practical

A modified finite Newton method for fast solution of large scale linear SVMs

J. Mach. Learn. Res.

A correspondence between Bayesian estimation on stochastic processes and smoothing by splines

Ann. Math. Stat.

Letters
A fast approximate algorithm for training $L_{1}$ -SVMs in primal space

Training $L_{1}$ -SVMs in the primal space