Elsevier

Neurocomputing

Volume 70, Issues 7–9, March 2007, Pages 1554-1560
Neurocomputing

Letters
A fast approximate algorithm for training L1-SVMs in primal space

https://doi.org/10.1016/j.neucom.2006.11.003Get rights and content

Abstract

We propose a novel and fast algorithm to train support vector machines (SVMs) in primal space, which solves an approximate optimization of SVMs with the properties of unconstraint, continuity and twice differentiability by utilizing the Newton optimization technique. Further, we devise a special pre-extracting procedure to speed up the convergence of the algorithm by resorting to a high-quality initial solution. Theoretical studies show that the proposed algorithm produces an ɛ-approximate solution to standard SVMs and maintains low computational complexity. Experimental results on benchmark data sets demonstrate that our algorithm is much faster than the dual based method such as SVMlight while it achieves the similar test accuracy.

Introduction

Support vector machines (SVMs) are modern learning systems that deliver state-of-the-art performance in real world pattern recognition applications, such as text categorization, hand-written character recognition etc. The principle of SVMs is the learning theory developed by Vapnik [2], [10]. The literature for solving SVMs mainly concentrates on its dual optimization, such as SVMlight [4], SMO [9] etc., whose core is a simply constrained convex quadratic programming (QP) problem. However, there are computational and storage limitations when seeking an exact solution to SVMs in dual space for large-scale classification tasks. An alternative method is to learn an approximate solution to SVMs quickly in primal space, as suggested by Keerthi et al. [5] and Chapelle [3].

Recently, several approaches handling the primal optimization of SVMs have been developed. Mangasarian [7] defined a generalized Hessian and transformed the original primal problem into unconstrained minimization of a strongly convex, piecewise quadratic objective function. Keerthi et al. [5] developed a fast method for solving linear SVMs with L2 loss function, which was much faster than dual methods and suitable for large-scale data mining tasks. Chapelle [3] has recently furthered their studies to the non-linear case and devised a promising Newton-primal algorithm to solve it. That algorithm, however, didn’t show training speed superior to dual methods despite its feasibility. Additional deficiency of Chapelle's method lay in its inability to guarantee the accuracy of its approximate solution.

In this paper, we propose a fast algorithm for training L1-SVMs in primal space, which reaches an ɛ-approximate solution with guaranteed accuracy. We also present a special pre-extracting procedure to further accelerate its convergence. Theoretical analyses show that our algorithm obtains the same computational complexity as the low-bound for solving SVMs. Experimental results indicate that the proposed algorithm is very competitive for training SVMs, which learns classifiers more quickly and cheaply and maintains comparable accuracy to SVMlight.

The paper is organized as follows: Section 2 deduces an approximate primal optimization of SVMs, which is unconstrained and twice differentiable. Section 3 proposes a Newton-type algorithm for solving it, and then analyze the costs of the algorithm as well as the properties of the solution. Section 4 presents an experimental evaluation on real benchmark data sets that demonstrates the efficiency of our method. Section 5 contains some concluding remarks.

Section snippets

Training L1-SVMs in the primal space

Consider a binary classification problem with training samples {xi,yi}i=1n, where xiRd and yi{+1,-1}. The separating hyperplane of non-linear L1-SVMs is determined by solving the following primal optimization [2], [10]:minw,ξ,bg(w,b)=12wTw+Ci=1nξis.t.yi[wTφ(xi)+b]1-ξi,ξi0.where φ(·):RdRN is the reproducing kernel map determined by kernel function k(xi,xj)=φ(xi)·φ(xj) and φ(xi) is the image of xi in the feature space RN. Using the representer theorem [6], we represent w in the reproducing

The proposed PN-pe algorithm

Based on the above analyses, we present a Newton-type algorithm for solving (5), which is called “primal Newton algorithm with pre-extracting” (or PN-pe).

Algorithm 1

PN-pe:

Input: Training samples {xi,yi}i=1n, kernel function k(xi,xj), penalty parameter λ, Huber parameter γ>0 and a small real υ>0.

  • (a)

    Initialization: Run the Pre-exctrating procedure to initialize USVs0 and BSVs0. Calculate the kernel submatrices Ku0,Kb0,Ku,b0,Ku,s¯0 and Kb,s¯0. Let k=0.

  • (b)

    k=k+1, update zk according to (11) and calculate rk=e-DATzk.

Experimental evaluation

We evaluate the performance of PN-pe algorithm on a wide range of benchmark data sets. A full description of them is listed in Table 1, where the shuttle and vehicle data sets are from LIBSVM data1 and the rest from UCI repository.2

The first experiment compares the performance of the PN-pe algorithm against the famous dual-based training algorithm, SVMlight [4]. The

Conclusions

The main contribution of this paper is that a novel and efficient primal algorithm for training L1-SVMs has been proposed, which yields an ɛ-approximate solution quickly with guaranteed accuracy. In order to speed up the proposed algorithm, we further devise a pre-extracting procedure to accelerate its convergence by providing it with a high-quality initial solution. Experimental results on real benchmark data sets demonstrate that the proposed algorithm is competitive for training L1-SVMs in

Acknowledgments

The authors wish to thank the reviewers and the editors for their comments and suggestions, which have helped to improve the manuscript considerably.

Lei Wang received his B.S. and M.S. degrees in Computer Science from Southwest Jiaotong University, China, in 2000 and 2003, respectively. He is currently pursuing a Ph.D. degree at the School of Computer Science and Engineering, University of Electronic Science and Technology of China. His research interests include pattern recognition, data mining and support vector machines.

References (12)

  • A. Bordes et al.

    Fast kernel classifiers with online and active learning

    J. Mach. Learn. Res.

    (2005)
  • C. Burgers

    A tutorial on support vector machines for pattern recognition

    Data Min. Knowl. Discovery

    (1998)
  • O. Chapelle, Training a support vector machine in the primal, Technical report, Max Planck Institute, 2006...
  • T. Joachims

    Making large-scale SVM learning practical

  • S.S. Keerthi et al.

    A modified finite Newton method for fast solution of large scale linear SVMs

    J. Mach. Learn. Res.

    (2005)
  • G.S. Kimeldorf et al.

    A correspondence between Bayesian estimation on stochastic processes and smoothing by splines

    Ann. Math. Stat.

    (1970)
There are more references available in the full text version of this article.

Cited by (5)

  • Fast Support Vector Machines for Continuous Data

    2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
  • Sparse support vector machines by kernel discriminant analysis

    2009, ESANN 2009 Proceedings, 17th European Symposium on Artificial Neural Networks - Advances in Computational Intelligence and Learning
  • Training one-class support vector machines In the primal space

    2009, Proceedings - 2009 International Conference on Electronic Computer Technology, ICECT 2009

Lei Wang received his B.S. and M.S. degrees in Computer Science from Southwest Jiaotong University, China, in 2000 and 2003, respectively. He is currently pursuing a Ph.D. degree at the School of Computer Science and Engineering, University of Electronic Science and Technology of China. His research interests include pattern recognition, data mining and support vector machines.

Shixin Sun was born in 1940 in Hubei Province, China. He graduated from the college of Mathematics, Sichuan University, China, in 1966. From 1984 to 1987, he was a Visiting Scholar and Research Fellow in the Institute of Computation and Applied Mathematics, University of Josephe-Fourrier Grenoble 1, France. During 1990–99, he worked at several universities in France, Italy and German. Currently, he is a professor in the School of Computer Science and Engineering, University of Electronic Science and Technology of China.

Kai Zhang received his B.S. degree in Information Engineering from Nanjing University of Science and Technology, China, in 1999, and his M.S. degree in Computer Science from the International University Bremen, Germany, in 2006. He is currently pursuing a Ph.D. degree at the Center for Intelligent Systems, Control and Robotics (CISCOR), Florida State University. His research interests include robot learning, human-robot interaction, multi-vehicle coordination and motion planning.

View full text