Elsevier

Neurocomputing

Volume 211, 26 October 2016, Pages 66-71
Neurocomputing

Computational performance optimization of support vector machine based on support vectors

https://doi.org/10.1016/j.neucom.2016.04.059Get rights and content

Abstract

The computational performance of support vector machine (SVM) mainly depends on the size and dimension of training sample set. Because of the importance of support vectors in the determination of SVM classification hyperplane, a kind of method for computational performance optimization of SVM based on support vectors is proposed. On one hand, at the same time of the selection of super-parameters of SVM, according to Karush-Kuhn-Tucker condition and on the precondition of no loss of potential support vectors, we eliminate non-support vectors from training sample set to reduce sample size and thereby to reduce the computation complexity of SVM. On the other hand, we propose a simple intrinsic dimension estimation method for SVM training sample set by analyzing the correlation between number of support vectors and intrinsic dimension. Comparative experimental results indicate the proposed method can effectively improve computational performance.

Introduction

Support vector machine (SVM) proposed by Vapnik [1] is one of important machine learning model. SVMs can achieve good performance by using the implementation of the structural risk minimization principle and the introduction of the kernel trick [2]. Due to the powerful abilities of generalization, small-sample and nonlinear processing, SVMs have been successfully applied to classification decision [3], [4], regressive modeling [5], fault diagnosis [6], [7] and bioinformatics [8], [9]. Although SVM can effectively avoid the problem of ‘curse of dimensionality’, the degraded computational performance caused by the increase of sample size or dimension cannot be solved, which are usually dealt by the following two ways: one is the improvement of learning algorithm, such as sequential minimal optimization (SMO) [10], successive overrelaxation (SOR) [11] and LIBSVMCBE [12] while the other is to simplify computation by reducing sample size or dimension of training sample set, which are respectively studied in this paper.

In dealing with large datasets, how to reduce the scale of training samples is one of the most important aspects to improve computation efficiency. There are some major difficulties when processing large data concerning a fully dense nonlinear kernel matrix. To overcome computational difficulties, some authors have proposed low-rank approximation to the full kernel matrix. As an alternative, Lee and Mangasarian [13] have proposed the method of reduced support vector machine (RSVM). The key ideas of the RSVM are as follows. Prior to training, it selects a small random subset as to generate a thin rectangular kernel matrix. Then, it uses this much smaller rectangular kernel matrix to replace the full kernel matrix in the nonlinear SVM formulation. Ke and Zhang [14] proposed a method called editing support vector machine (ESVM) by removing some samples near the boundary from the training set. Its basic scheme is similar to that editing nearest neighbor methods in statistical pattern recognition, which randomly divides the training set into subsets, using one subset to edit the other one and then get the final decision boundary using the remained samples. A reverse algorithm was proposed by Koggalage and Halgamuge [15], i.e., the scale of training set is reduced by removing samples having no contributions to classification. In summary, the basic idea of the methods mentioned above is to represent original training sample set by finding out more suited alternatives. Generally speaking, the more representative of the reduced dataset, the better performance of SVM is achieved. However, the reduced dataset in the above methods is chosen randomly and thus lack of representativeness.

In dealing with large datasets, how to reduce the dimension of training samples is another important mean to improve the computation efficiency of SVM. There are two ways to realize dimensionality reduction (DR): feature selection and feature extraction. Feature selection simplifies the training sample set by selecting a subset of relevant features from the original input features and eliminating irrelevant features [16], [17]. Through conducting a certain mathematical transformation on input features, feature extraction projects the high-dimensional data into a low-dimensional space. Of course, dimensionality reduction has its downside, for whatever kinds of projection will result in information loss of original high-dimensional data. Then the difficulty in DR is to obtain the simplest low-dimensional representation on the condition that the essential features of the original high-dimensional data are retained. That is how to find the possible lowest dimension (intrinsic dimension) [18] while retaining as many as possible the essential features.

For various dimensionality reduction methods, intrinsic dimension is a critical parameter that needs further study so far. Accurate estimation of the intrinsic dimension from high-dimensional data plays an important role for subsequent dimensionality reduction. The current estimation methods can be classified into two categories: one is algebra-based Eigenvalue method [19] while the other is geometrical feature-based method such as maximum likelihood estimator (MLE) [20], correlation dimension (CorrDim) [21], packing numbers [22], nearest neighbor dimension (NearNbDim) [23], and geodesic minimum spanning tree (GMST) [24] etc. The Eigenvalue estimation method estimates the intrinsic dimension by sorting eigenvalues of covariance matrix, followed by determining the number of important features or entropying threshold of eigenvalue (e.g. accumulative contribution rate in PCA). However, the threshold value or proper number of retainable features cannot be indicated and thus the effect of nonlinear manifold estimation is not satisfactory. Through the likelihood function for neighbor distance is established, MLE gets the maximum likelihood estimation for intrinsic dimension. A research of Belkin indicates that MLE is an unbiased estimation [25]. The neighborhood graph is built up through calculating Euclidean distance between samples in NearNbDim method. Therefore, computation complexity grows up exponentially with sample size, which is not suitable when size is large. The neighborhood graph in GMST method is alternatively built up by replacing Euclidean distance with geodesic distance, causing computation complexity as well.

It has been known that the computational performance of SVM also highly depends on the selection of super-parameters. Therefore, the optimal choice is of critical importance to obtain a good performance in handling pattern recognition problems with SVMs. From SVMs being introduced, much effort has been made to develop efficient methods to optimize super-parameters. The commonly used super-parameter selection methods, including empirical selection, grid search, gradient descent and intelligent optimization [26], [27], have some defects, such as low efficiency, high computation complexity or undesirable super-parameters. Determination of appropriate super-parameters is not an easy thing due to large-scale training sample set.

In summary, sample size reduction, intrinsic dimension estimation and super-parameters selection are research focuses in the field of SVM. Related researches have been carried out and presented. However, existing methods are merely applicable for only one of the topics and none of them is for all. As classification hyperplane of SVM is entirely determined by support vectors (SVs) [28], thus a novel method for computational performance optimization of SVM based on support vectors is proposed in this paper. There are three contributions in this paper: (1) All the 3 topics are taken simultaneously from support vectors point of view; (2) We present a method for properly eliminating non-support vectors from original training sample set by using Karush-Kuhn-Tucker (KKT) condition. Thus, computation complexity is reduced by a smaller sample size; (3) We design a simple intrinsic dimension estimation method by analyzing the correlation between number of support vectors and intrinsic dimension, which is proved by experiment.

The paper is organized as follows. SVM is briefly introduced in Section 2 while the proposed method based on support vectors is described in Section 3, followed by experimental validation, discussion and then conclusions.

Section snippets

SVM model

A task of supervised binary classification is referred to. We denote the training sample set as T={(xi,yi)|xiRk}, in which xi means the training sample labeled with class yi{-1,1}. It is the goal of SVM that the classification accuracy is maximized by widening as possible the margin between the decision hyperplane and datasets. With regard to a classical linear SVM, a class label is assigned, based on the position information of the decision hyperplane aspect. It is defined as [1]:wTx+b=0

Sample size reduction based on SVs

For support vector machine, its classification hyperplane is entirely determined by support vectors [29]. Therefore, the most important problem existing in sample size reduction is to select small number of representative training samples from the large training sample set under the condition that all support vectors are retained as far as possible. If support vectors are changed significantly, computational performance will be definitely affected.

SVM's optimal solution α*=(α1*,...,αl*)T can be

Experimental results

In order to verify the validity and superiority of the proposed method, experiments on 3 remote sensing (KSC, ROSIS and Salinas) and 5 UCI (100PSL, ISOLET, USPS, HAR and Gas) datasets were carried out, which are shown in Table 1.

The comparative experiments include two parts. For the first part, the proposed method (here represented by DR&I) was compared with EigenValue [19], MLE [20], GMST [24]. Parameters for these intrinsic dimension estimation methods are set as follows: (1) PCA is adopted

Conclusions

Support vector machines play a very important role in data classification because of their good generalization performance. Although SVM can effectively avoid the problem of ‘curse of dimensionality’, degraded computational performance caused by the increase of sample size or dimension cannot be solved. As support vectors play an important role in the determination of SVM classification hyperplane, in this paper we study how to improve computational performance from support vectors point of

Acknowledgements

This work is supported by National Natural Science Foundation of China (61273143, 61472424) and Fundamental Research Funds for the Central Universities (2013RC12).

Xuesong Wang received the PhD degree from China University of Mining and Technology in 2002. She is currently a professor in the School of Information and Electrical Engineering, China University of Mining and Technology. Her main research interests include machine learning, bioinformatics, and artificial intelligence. In 2008, she was the recipient of the New Century Excellent Talents in University from the Ministry of Education of China.

References (32)

Xuesong Wang received the PhD degree from China University of Mining and Technology in 2002. She is currently a professor in the School of Information and Electrical Engineering, China University of Mining and Technology. Her main research interests include machine learning, bioinformatics, and artificial intelligence. In 2008, she was the recipient of the New Century Excellent Talents in University from the Ministry of Education of China.

Fei Huang received the master degree from China University of Mining and Technology in 2014. His main research interests include support vector machine.

Yuhu Cheng received the PhD degree from the Institute of Automation, Chinese Academy of Sciences in 2005. He is currently a professor in the School of Information and Electrical Engineering, China University of Mining and Technology. His main research interests include machine learning, transfer learning, and intelligent system. In 2010, he was the recipient of the New Century Excellent Talents in University from the Ministry of Education of China.

View full text