Training soft margin support vector machines by simulated annealing: A dual approach
Introduction
Pattern classification methods aim at providing functions that define the relation between input vectors and their class labels. Differently from artificial neural networks (ANNs) and decision trees training process, the training process of large margin classifiers such as support vector machines (SVM) relies on minimizing the empirical and structural risk (Vapnik, 1998). Minimizing the empirical risk means to reduce the classification error on training dataset and minimizing the structural risk is related to decrease the classification error on unseen patterns. Since learning algorithms are mostly based on minimizing only the empirical risk, this is an important characteristic of SVM over other classifiers.
In order to train SVMs, one must solve a quadratic programming problem (also called, quadratic optimization problem). SVMs have two formulations for its quadratic programming problem, the primal and the dual. The dual optimization problem is presented in terms of Lagrange multipliers and the bias. There are a few methods used to obtain the Lagrange multipliers and the bias, such as sequential minimal optimization (SMO) (Platt, 1999), kernel adatron (KA) (Anlauf & Biehl, 1989) or even classical mathematical methods based on numerical optimization (Gill, Murray, & Wright, 1981). SMO solves the SVM quadratic programming problem in an analytical way by decomposing it into quadratic programming sub-problems and solving the smallest possible optimization problem, which involves only two Lagrange multipliers, at each step. KA is an online method that solves the SVM quadratic programming problem by using first order information (gradient). Classical mathematical methods, henceforth called QP, solve the quadratic programming problem by numerical optimization, which unfortunately demands for the manipulation of large matrices leading to more numerical precision errors. Other strategies for efficient training SVM can be found in Djuric, Lan, Vucetic, and Wang (2013) and Frandi, Ñanculef, Lodi, Sartori, and Suykens (2015).
Metaheuristics such as simulated annealing (SA) and genetic algorithms (GAs) are an alternative to methods based on numerical optimization, such as those ones included in classical mathematical libraries. SA also aims at generating useful solutions to search problems. Due to the underlying features of these metaheuristics, some optimization problems can be solved without supposing linearity, differentiability, continuity or convexity of the objective function. Unfortunately, these desired properties are not found in several classical mathematical methods. Metaheuristics have been used with SVMs in classification tasks mainly to tune their parameters, select features and obtain reduced-set of support vectors (da Rocha Neto, Barreto, 2013, Silva, Silva, Neto, 2015).
The first attempt to use metaheuristics (in fact, GAs) in order to solve the primal problem of SVMs for classification tasks has demanded some change on its primal problem formulation (Dumitrescu, Preuss, Stoean, Stoean, 2006, Stoean, Preuss, Stoean, Dumitrescu, 2007). A similar approach in terms of modeling was also proposed but to train SVMs for regression tasks (Stoean, Dumitrescu, Preuss, & Stoean, 2006), so that the proposed regression method models a solution for the primal optimization problem. Besides these attempts, a dual formulation GA-based proposal was presented in Mierswa (2006); however, the problem was changed in order to leave one of the constraints out and the bias was defined as zero.
An other Linear PSO-based attempt and its extensions (Li, Tong, Bai, Zhang, 2007, Silva, Gonçalves, 2013, Yuan, Zhang, Zhang, Yang, 2006) were applied to train SVMs (Paquet, Engelbrecht, 2003, Paquet, Engelbrecht, 2003). However, in such a proposal the Karush-Kuhn-Tucker (KKT) conditions are not fully satisfied. Therefore, to the best of our knowledge, there is not in the literature an metaheuristics-based method, which deals with the quadratic optimization problem of SVM in its default dual formulation and, as consequence, also keeps the primal problem as initially proposed by Vapnik (1998).
In this paper, we introduce a novel learning approach based on simulated annealing for training easily (SATE) support vector machines. To do so, we model an instance of SA to handle the dual optimization problem and its constraints in order to obtain the Lagrange multipliers and the bias for the decision function. In terms of SA, we propose a solution modeling, as well as a neighborhood and an energy function.
The remaining part of this paper is organized as follows. In Section 2, we review the fundamentals of support vector machines. In Section 3 some learning algorithms for SVM are presented. After that, in Section 4, we briefly introduce simulated annealing and then in Section 5 is presented our proposal. In Section 6, we present our simulations and, finally, the paper is concluded in Section 7.
Section snippets
Support vector machines
Consider a training dataset so that is an input vector and is the corresponding class labels. For soft margin classification, the SVM (Vapnik, 1998) primal problem is defined as where is the set of slack variables, b is the bias and C is a cost parameter that controls the trade-off between allowing training errors and forcing rigid margins.
The SVM dual optimization problem can also be presented as
Solutions for SVM
In this section, we present some learning methods used to train support vector machines.
Simulated annealing
Simulated Annealing (SA) is an adaptation of the Metropolis–Hastings algorithm and was independently described in Kirkpatrick, Gelatt, and Vecchi (1983) and Černý (1985). SA is a popular local search meta-heuristic used to deal with discrete and continuous optimization problems. The key feature of simulated annealing is to provide a means to escape from local optimum by allowing hill-climbing moves towards worse objective function values in hope that a global optimum is found.
Simulated
Proposal: simulated annealing for training easily SVM
Our proposal named SATE relies on the simulated annealing algorithm, which maximizes the expression presented in Eq. (4). As expected, it is an algorithm to obtain the Lagrange multipliers and the bias. In order to describe our proposal in terms of a SA-based optimization problem, we have to explain in detail the solution modeling, the neighborhood function Next(α) and the energy function H(α). Thus, we present these requirements in the next subsections.
Simulations and discussion
Initially, as a proof of concept, we have trained SVM with SMO, KA, QP and SATE to solve two artificial problems. The first problem, called Artificial Problem I (API), consists of a linearly separable two-dimensional dataset, so that data instances within each class are independent and uniformly distributed with the same within- and between-class variances. The second problem, named Artificial Problem II (APII), consists of a similar dataset to API, but with some overlapping between the classes.
Conclusion
We propose an algorithm named SATE to obtain the Lagrange multipliers and the bias for SVMs through simulated annealing. The SVM constraints were successfully embedded into a version of simulated annealing algorithm; so that the quadratic optimization problem was described in the simulated annealing framework. To do so, we accomplished to model the neighborhood function and the energy function. SATE was compared with SMO, KA, a QP solver, as well as with PSO and GA-based versions. The
Acknowledgments
The authors would like to thank the IFCE and CAPES for supporting their research.
References (26)
Simulated annealing: A tool for operational research
European Journal of Operational Research
(1990)- et al.
Evolutionary support vector regression machines
8th international symposium on symbolic and numeric algorithms for scientific computing (SYNASC 2006), 26–29 September 2006, Timisoara, Romania
(2006) - et al.
The adatron: An adaptive perceptron algorithm
Europhysics Letters
(1989) - et al.
UCI repository of machine learning databases
(1998) Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm
Journal of Optimization Theory and Applications
(1985)Statistical comparisons of classifiers over multiple data sets
The Journal of Machine Learning Research
(2006)- et al.
Evolutionary support vector machines: A dual approach
2016 IEEE congress on evolutionary computation (CEC)
(2016) - et al.
Budgetedsvm: A toolbox for scalable svm approximations.
Journal of Machine Learning Research
(2013) - et al.
Evolutionary support vector machines and their application for classification
(2006) Practical methods of optimization
(2013)