Elsevier

Applied Soft Computing

Volume 64, March 2018, Pages 356-365
Applied Soft Computing

MS-SVM: Minimally Spanned Support Vector Machine

https://doi.org/10.1016/j.asoc.2017.12.017Get rights and content

Highlights

  • We propose an algorithm called Minimally-Spanned Support Vector Machine (MS-SVM) algorithm with a view to reducing the number of support vectors compared to SVM.

  • This can reduce the classification time of a test data point significantly.

  • The MS-SVM algorithm is equally effective with both linear and non-linear and non-linear SVMs, where the Minimum spanning tree can be computed either in the input space or in the feature space without altering the performance much.

  • The MS-SVM algorithm can discriminate between classes with complex orientation.

  • Experimental results on several real data sets as well as on a synthetic data set show that MS-SVM can significantly reduce the number of support vectors (sometimes even more than 80%) without sacrificing performance.

Abstract

For a Support Vector Machine (SVM) algorithm, the time required for classifying an unknown data point is proportional to the number of support vectors. For some real time applications, use of SVM could be a problem if the number of support vectors is high. Depending on the complexity of the class structure, sometimes the number of support vectors of a SVM model increases with the number of training data points. Here our objective is to reduce the number of support vectors, yet, maintaining more of less the same level of accuracy as that of a normal SVM that does not use any reduction of support vectors. An SVM finds a separating hyperplane maximizing the margin of separation and hence, the location of the hyperplane is primarily dependent on a set of “boundary points”. Here, we first identify some boundary points using a minimum spanning tree on the training data to obtain a reduced training set. The SVM algorithm is then applied on the reduced training data to generate the classification model. We call this algorithm, Minimally Spanned Support Vector Machine (MS-SVM). We also assess the performance by relaxing the definition of boundary points. Moreover we extend the algorithm to a feature space using a kernel transformation. In this case, an MST is generated in the feature space using the associated kernel matrix. Our experimental results demonstrate that the proposed algorithm can considerably reduce the number of support vectors without affecting the overall classification accuracy. This is true irrespective of whether the MST is generated in the input space or in the feature space. Thus the MS-SVM algorithm can be used instead of SVM for efficient classification.

Introduction

The Support Vector Machine (SVM) algorithm proposed by Vapnik [1] has been successfully applied in many applications over the past decades. The good generalization capability of SVM to separate two classes has made it widely acceptable. However, the run time of an SVM model to classify a data point is directly related to the number of support vectors and this could be problematic for some real time applications like BCI applications where an instantaneous decision has to be made based on continuous signals. To classify a data point, an SVM computes the dot product of the given test point with every support vector either in the input space or in the feature space after transformation via a kernel function. Thus, the execution time increases with an increase in the number of support vectors.

In the past, researchers have proposed various methods to reduce the number of support vectors [[2], [3], [4], [5], [6], [7], [8], [9], [10], [13], [14]]. In [2] authors proposed the v-SVM algorithm, where the parameter v is used to control the number of support vectors. However, it may not always give a small set of support vectors [3]. Li et al. [3] presented an algorithm where, the training data points of each class are clustered separately using the k-means algorithm and only the cluster centers are used to train an SVM model. The number of support vectors is controlled by the choice of k for each class; when k is set to the number of data points in each class, it reduces to the normal SVM. In this algorithm, determining the desired value of k for each class requires further investigation. It is worth noting here that, if only the cluster centers are used for training, the SVM may not be able to capture the boundary of classes properly – a cluster center represents the core of a cluster but not the boundary, while boundary points are more important for defining the classifier by an SVM. In [4] Downs et al. proposed an algorithm to remove linearly redundant support vectors. The authors of [6] heuristically divided the training data into smaller chunks such that every training data points is just in one chunk. Each of these chunks is then used to find support vectors, and only these support vectors are kept for further training [8]. Though, the time required to train an SVM model using this algorithm will be considerably high, it has an added advantage of tackling the memory constraint. Dagl et al. [7] introduced the concept of fuzziness for identifying irrelevant training data points. The class membership of each training data point is calculated from its k-nearest neighbors. The training data points with crisp class membership are discarded as irrelevant and the SVM model is trained with the remaining data points. The k-means algorithm has also been used in [[8], [9]] to remove irrelevant data points. In [8] the radii of the clusters obtained using k-means are increased/decreased to make them largest possible crisp clusters with at least a minimum number of data points. The training data points from the core of these crisp clusters are removed as irrelevant. In [10] Kumar et al. also clustered each class separately using different clustering algorithms and have then used the cluster centers to train SVMs, thereby reducing the number of support vectors as in [3]. Note that, the philosophy of [8] is to discard the core of each cluster, while that of [[3], [10]] is to use the core of a cluster as represented by the center of a cluster. In [13], the authors have proposed an interesting method to reduce the number of support vectors of a fuzzified version of SVM. They have used an l0-norm of the vector of Lagrangian multipliers as a regularizer to the dual problem of the fuzzy SVM. Although the use of l0-norm is a natural choice for reducing the number of support vectors, it makes the optimization problem non-convex, which is then solved using a continuous approximation function. In [14] authors use the generalized support vector machine along with a rectangular kernel. Here while finding the nonlinear separating surface, authors solves an optimization problem with a few variables (corresponding to about 10% of the randomly selected data points) but use the entire dataset as constraints such that once the decision surface is found, almost 90% of the data can be discarded.

In this paper we propose algorithms to reduce the number of support vectors for SVMs. We call this method Minimally-Spanned Support Vector Machine (MS-SVM). A minimum spanning tree algorithm is first applied on the training dataset to remove data points which are far away from the expected decision boundary. The SVM model is then built on the remaining data points. The effectiveness of the proposed algorithm has been demonstrated on a set of benchmark datasets. We have considered both linear and non-linear SVMs. The MST can be obtained either in the input space or in the feature space. An augmented version of the algorithm is also proposed for data sets with very well separated classes. The rest of the paper is organized as follows: Section 2 gives a brief introduction to minimum spanning tree and SVM algorithm, then the minimally-spanned support vector machine algorithm is proposed. In Section 3, experimental results on a set of benchmark datasets are discussed, and finally in Section 4 the paper is concluded.

Section snippets

Minimum spanning tree

In a connected undirected weighted graph G(V, E) a minimum spanning tree (MST) is a sub-graph G′(V, E′) such that, the number of edges of G′ is equal to one less than the number of vertices (i.e. |E′| = |V| − 1) connecting all the vertices and the sum of weight of edges in E′ being minimal [11]. Note that, a graph may have multiple MSTs. There are several algorithms for finding a minimum spanning tree of a graph. In this work we have used Prim's algorithm [12]. Since an MST is a minimum-weight

Results

The effectiveness of the minimally-spanned support vector machine algorithm is demonstrated on 12 benchmark datasets (including three high dimensional ones) and one synthetically generated two-class dataset. The synthetic dataset, named, Synthetic, consists of 200 points; 100 2-dimensional points in each of two classes. Data points in each class are randomly generated from a Gaussian distribution. The results depicted in subsequent tables are generated using a two-level 10-fold cross validation

Conclusion and discussion

The support vector machine algorithm for classification is widely used in various application domains. But, the time required for classification, which increases with the number of support vectors, may not make it feasible for some real time applications when the support vectors are large in number. Again, training the SVM with a reduced dataset will reduce the learning time, as the optimization problem will have less data points, especially when the actual dataset is large. In this work we

References (15)

  • V. Vapnik

    The Nature of Statistical Learning Theory

    (1995)
  • B. Scholkopf et al.

    New support vector algorithms

    Neural Comput.

    (2000)
  • Q.-A. Tran et al.

    Reduce the number of support vectors by using clustering techniques

    Proceedings of the Second International Conference on Machine Learning Cybernetics, Xi’an

    (2003)
  • T. Downs et al.

    Exact simplification of support vector solutions

    J. Mach. Learn. Res.

    (2001)
  • D. Geebelen et al.

    Reducing the number of support vectors of SVM classifiers using the smoothed separable case approximation

    IEEE Trans. Neural Netw. Learn. Syst.

    (2012)
  • T. Joachims

    Making Large-Scale SVM Learning Practical. Advances in Kernel Methods-Support Vector Learning

    (1999)
  • S. Sohn et al.

    Advantages of using fuzzy class memberships in self-organizing map and support vector machines

    Proceedings of the International Joint Conference on Neural Networks

    (2001)
There are more references available in the full text version of this article.

Cited by (19)

  • A new method to forecast multi-time scale load of natural gas based on augmentation data-machine learning model

    2022, Chinese Journal of Chemical Engineering
    Citation Excerpt :

    For the indistinguishable character of the samples in the linear space, SVM utilizes the relaxation variables and kernel functions to find the optimal classification support surface in the high-dimensional space. Moreover, SVM improves the generalization ability through the structural risk minimization principle [27,28]. It solves the problems of small samples, nonlinearity, high dimensionality, and local minimal points.

  • Intuitionistic fuzzy twin support vector machines with the insensitive pinball loss[Formula presented]

    2022, Applied Soft Computing
    Citation Excerpt :

    Permanent link to reproducible Capsule: https://doi.org/10.24433/CO.9873609.v1. Support vector machines(SVMs) [1–4] are effective tools for solving some classification and regression problems. The idea of SVMs is to seek an optimal hyperplane in terms of the structural risk minimization principle.

  • Short-term wind speed forecasting using recurrent neural networks with error correction

    2021, Energy
    Citation Excerpt :

    However, a disadvantage of the statistical model is that the inherent linear assumption makes it lack the nonlinear fitting ability [20], making it difficult to handle a time series of wind speed with nonlinear characteristics. In the past few decades, with the rapid development of artificial intelligence (AI) methods that have been developed and applied in many fields, and in particular in wind speed prediction in order to improve prediction performance [21]. Generally, artificial intelligence methods have good self-learning and self-organization capabilities and can approximate nonlinear functions [22].

  • Discriminative information-based nonparallel support vector machine

    2019, Signal Processing
    Citation Excerpt :

    Further, it can transform the nonlinear case into linear case by introducing the kernel trick into the dual QPP. Because of its good characteristics, many variants [5–8] have been presented. Meanwhile, many novel machine learning approaches based on SVM [9–12] have also been proposed.

  • Minimum Spanning Set Selection in Graph Kernels

    2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
View all citing articles on Scopus
View full text