Elsevier

Expert Systems with Applications

Volume 87, 30 November 2017, Pages 157-169
Expert Systems with Applications

Training soft margin support vector machines by simulated annealing: A dual approach

https://doi.org/10.1016/j.eswa.2017.06.016Get rights and content

Highlights

  • It was proposed a method to solve the dual quadratic optimization problem of SVMs.

  • The proposal named SATE is based on simulated annealing.

  • The objective function and constraints for SVM were successfully embedded in SATE.

  • Our proposal is very simple to implement and achieved high sparseness.

  • Our proposal was tested on real-world datasets and evaluated by statistical tests.

Abstract

A theoretical advantage of support vector machines (SVM) is the empirical and structural risk minimization which balances the complexity of the model against its success at fitting the training data. Metaheuristics have mostly been used with support vector machines to either tune hyperparameters or to perform feature selection. In this paper, we present a new approach to obtain sparse support vector machines (SVM) based on simulated annealing (SA), named SATE. In our proposal, SA was used to solve the quadratic optimization problem that emerges from support vector machines rather than tune the hyperparameters. We have compared our proposal with sequential minimal optimization (SMO), kernel adatron (KA), a usual QP solver, as well as with recent Particle Swarm Optimization (PSO) and Genetic Algorithms(GA)-based versions. Generally speaking, one can infer that the SATE is equivalent to SMO in terms of accuracy and mean of support vectors and sparser than KA, QP, LPSO, and GA. SATE also has higher accuracies than the GA and PSO-based versions. Moreover, SATE successfully embedded the SVM constraints and provides a competitive classifier while maintaining its simplicity and high sparseness in the solution.

Introduction

Pattern classification methods aim at providing functions that define the relation between input vectors and their class labels. Differently from artificial neural networks (ANNs) and decision trees training process, the training process of large margin classifiers such as support vector machines (SVM) relies on minimizing the empirical and structural risk (Vapnik, 1998). Minimizing the empirical risk means to reduce the classification error on training dataset and minimizing the structural risk is related to decrease the classification error on unseen patterns. Since learning algorithms are mostly based on minimizing only the empirical risk, this is an important characteristic of SVM over other classifiers.

In order to train SVMs, one must solve a quadratic programming problem (also called, quadratic optimization problem). SVMs have two formulations for its quadratic programming problem, the primal and the dual. The dual optimization problem is presented in terms of Lagrange multipliers and the bias. There are a few methods used to obtain the Lagrange multipliers and the bias, such as sequential minimal optimization (SMO) (Platt, 1999), kernel adatron (KA) (Anlauf & Biehl, 1989) or even classical mathematical methods based on numerical optimization (Gill, Murray, & Wright, 1981). SMO solves the SVM quadratic programming problem in an analytical way by decomposing it into quadratic programming sub-problems and solving the smallest possible optimization problem, which involves only two Lagrange multipliers, at each step. KA is an online method that solves the SVM quadratic programming problem by using first order information (gradient). Classical mathematical methods, henceforth called QP, solve the quadratic programming problem by numerical optimization, which unfortunately demands for the manipulation of large matrices leading to more numerical precision errors. Other strategies for efficient training SVM can be found in Djuric, Lan, Vucetic, and Wang (2013) and Frandi, Ñanculef, Lodi, Sartori, and Suykens (2015).

Metaheuristics such as simulated annealing (SA) and genetic algorithms (GAs) are an alternative to methods based on numerical optimization, such as those ones included in classical mathematical libraries. SA also aims at generating useful solutions to search problems. Due to the underlying features of these metaheuristics, some optimization problems can be solved without supposing linearity, differentiability, continuity or convexity of the objective function. Unfortunately, these desired properties are not found in several classical mathematical methods. Metaheuristics have been used with SVMs in classification tasks mainly to tune their parameters, select features and obtain reduced-set of support vectors (da Rocha Neto, Barreto, 2013, Silva, Silva, Neto, 2015).

The first attempt to use metaheuristics (in fact, GAs) in order to solve the primal problem of SVMs for classification tasks has demanded some change on its primal problem formulation (Dumitrescu, Preuss, Stoean, Stoean, 2006, Stoean, Preuss, Stoean, Dumitrescu, 2007). A similar approach in terms of modeling was also proposed but to train SVMs for regression tasks (Stoean, Dumitrescu, Preuss, & Stoean, 2006), so that the proposed regression method models a solution for the primal optimization problem. Besides these attempts, a dual formulation GA-based proposal was presented in Mierswa (2006); however, the problem was changed in order to leave one of the constraints out and the bias was defined as zero.

An other Linear PSO-based attempt and its extensions (Li, Tong, Bai, Zhang, 2007, Silva, Gonçalves, 2013, Yuan, Zhang, Zhang, Yang, 2006) were applied to train SVMs (Paquet, Engelbrecht, 2003, Paquet, Engelbrecht, 2003). However, in such a proposal the Karush-Kuhn-Tucker (KKT) conditions are not fully satisfied. Therefore, to the best of our knowledge, there is not in the literature an metaheuristics-based method, which deals with the quadratic optimization problem of SVM in its default dual formulation and, as consequence, also keeps the primal problem as initially proposed by Vapnik (1998).

In this paper, we introduce a novel learning approach based on simulated annealing for training easily (SATE) support vector machines. To do so, we model an instance of SA to handle the dual optimization problem and its constraints in order to obtain the Lagrange multipliers and the bias for the decision function. In terms of SA, we propose a solution modeling, as well as a neighborhood and an energy function.

The remaining part of this paper is organized as follows. In Section 2, we review the fundamentals of support vector machines. In Section 3 some learning algorithms for SVM are presented. After that, in Section 4, we briefly introduce simulated annealing and then in Section 5 is presented our proposal. In Section 6, we present our simulations and, finally, the paper is concluded in Section 7.

Section snippets

Support vector machines

Consider a training dataset {xi,yi}i=1l, so that xiRp is an input vector and yi{1,+1} is the corresponding class labels. For soft margin classification, the SVM (Vapnik, 1998) primal problem is defined as minw,b,ξ{12wTw+Ci=1lξi},s.t.yi(wTxi+b)1ξi,ξi0,where {ξi}i=1l is the set of slack variables, b is the bias and C is a cost parameter that controls the trade-off between allowing training errors and forcing rigid margins.

The SVM dual optimization problem can also be presented as maxα{i=1l

Solutions for SVM

In this section, we present some learning methods used to train support vector machines.

Simulated annealing

Simulated Annealing (SA) is an adaptation of the Metropolis–Hastings algorithm and was independently described in Kirkpatrick, Gelatt, and Vecchi (1983) and Černý (1985). SA is a popular local search meta-heuristic used to deal with discrete and continuous optimization problems. The key feature of simulated annealing is to provide a means to escape from local optimum by allowing hill-climbing moves towards worse objective function values in hope that a global optimum is found.

Simulated

Proposal: simulated annealing for training easily SVM

Our proposal named SATE relies on the simulated annealing algorithm, which maximizes the expression i=1lαi12i=1lj=1lαiαjyiyjxiTxj presented in Eq. (4). As expected, it is an algorithm to obtain the Lagrange multipliers and the bias. In order to describe our proposal in terms of a SA-based optimization problem, we have to explain in detail the solution modeling, the neighborhood function Next(α) and the energy function H(α). Thus, we present these requirements in the next subsections.

Simulations and discussion

Initially, as a proof of concept, we have trained SVM with SMO, KA, QP and SATE to solve two artificial problems. The first problem, called Artificial Problem I (API), consists of a linearly separable two-dimensional dataset, so that data instances within each class are independent and uniformly distributed with the same within- and between-class variances. The second problem, named Artificial Problem II (APII), consists of a similar dataset to API, but with some overlapping between the classes.

Conclusion

We propose an algorithm named SATE to obtain the Lagrange multipliers and the bias for SVMs through simulated annealing. The SVM constraints were successfully embedded into a version of simulated annealing algorithm; so that the quadratic optimization problem was described in the simulated annealing framework. To do so, we accomplished to model the neighborhood function and the energy function. SATE was compared with SMO, KA, a QP solver, as well as with PSO and GA-based versions. The

Acknowledgments

The authors would like to thank the IFCE and CAPES for supporting their research.

References (26)

  • Frandi, E., Ñanculef, R., Lodi, S., Sartori, C., & Suykens, J. A. (2015). Fast and scalable lasso via stochastic...
  • P.E. Gill et al.

    Practical optimization

    (1981)
  • S. Kirkpatrick et al.

    Optimization by simulated annealing

    Science

    (1983)
  • Cited by (0)

    View full text