Improvements on least squares twin multi-class classification support vector machine

doi:10.1016/j.neucom.2018.06.040

Neurocomputing

Volume 313, 3 November 2018, Pages 196-205

https://doi.org/10.1016/j.neucom.2018.06.040 Get rights and content

Abstract

Recently, least squares twin multi-class support vector machine (LSTKSVC) was proposed as a least squares version of twin multi-class classification support vector machine (Twin-KSVC), both based on twin support vector machine (TWSVM). In this paper, we propose a novel multi-class classifier termed as Improvements on least squares twin multi-class classification support vector machine that is motivated by LSTKSVC and Twin-KSVC. Similarly to LSTKSVC that evaluates all the training data into a $` ` 1 - v e r s u s - 1 - v e r s u s - r e s t^{″}$ structure, the algorithm here proposed generates ternary output ${- 1, 0, + 1}$ . Whereas Twin-KSVC needs to solve two quadratic programming problems (QPPs), the solution of the two modified primal problems for our algorithm is reduced to two systems of linear equations. Besides that, in our algorithm the structural risk minimization (SRM) principle is implemented by introducing a regularization term, along with minimizing the empirical risk. To test the efficacy and validity of the proposed method, numerical experiments on ten UCI benchmark data sets are performed. The results obtained further corroborate the effectiveness of the proposed algorithm.

Introduction

The support vector machine (SVM) approach, originally proposed by Vapnik and his colleagues [1], [2], [3], [4] for binary classification, is a promising machine learning technique when compared with other machine learning approaches, such as artificial neural networks [5]. SVM solves a quadratic programming problem (QPP) assuring that once an optimal solutions is obtained, it is the unique (global) solution; It also implements the structural risk minimization principle, which minimizes the upper bound of the generalization error [6]. The basic idea of SVM is to find an optimal separating hyperplane with a maximum margin between two parallel support hyperplanes [7].

Different from the standard SVM, which uses a single hyperplane, Mangasarian and Wild [8] proposed a generalized eigenvalue proximal support vector machine (GEPSVM), for binary classification problems, which aims to generate two nonparallel hyperplanes such that each hyperplane is closer to its class and is as far as possible from the other class. This idea leads to solving two generalized eigenvalue problems, which in turn reduces computational cost compared with SVM, that is needed to solves one quadratic QPP [9].

Thereafter, a non-parallel hyperplane classifier termed as twin support vector machine (TWSVM) was proposed by Jayadeva et al. [10] for binary classification in light of the generalized eigenvalue proximal support vector machine (GEPSVM). TWSVM aims at generating two non-parallel hyperplanes by solving a pair of QPPs such that each hyperplane is closer to the pattern in one of the two classes and is as far as possible from the other. Each QPP is smaller than the ones traditionally found in SVMs, which makes TWSVM work almost four times faster than the standard SVM classifier .

Due to its lower computational complexity, TWSVM has become one of the most popular kind of methods nowadays. Many variants of TWSVM have been proposed, such as twin bounded support vector machine (TBSVM) [11], ν-TSVM [12], robust TWSVM [13], least square TWSVM [14], projection TWSVM [15], and twin support vector regression [16].

It is well known that one significant advantage of TBSVM [11] is the implementation of the structural risk minimization by adding a regularization term with the idea of maximizing the margin. An effective method, called successive over relaxation (SOR) [17] is applied to TBSVM, in order to shorten training time. To avoid having to solve two quadratic programming problems, least squares twin support vector machines (LSTSVM) was proposed by Kumar and Gopal [14], which exchanges the convex QPPs in TWSVM with a convex linear system by using the squared loss function instead of the hinge one, leading to a very fast training speed [18].

Several methods have proposed the incorporated regularization term [9], [19], [20], [21], [22], which is utilized to avoid the singularity problem and reach better generalization ability, similar to TBSVM [11]. Robust and sparse linear programming TWSVM [19], in addition to incorporating a regularization term, the Newton-Armijo algorithm was used for solving a pair of exterior penalty problems and 1-norm replaced 2-norm, in order to generate a robust solution. However, this formulation was not extended to problems of multi-class data sets. Shao et al. [22] proposed the least squares projection twin support vector machine (LSPTSVM) by considering the equality constraints, and an extra regularization term is introduced in the primal problem of LSPTSVM. However, LSPTSVM worked with binary classifications. Recently, several multi-class TWSVM approaches have been presented [6], [7], [23], [24], [25]. Inspired by LSPTSVM and multiple recursive projection twin support vector machine (MPTSVM) [26] that worked with multi-classes, Yang et al. [9] presented an extension for multi-classes problems for LSPTSVM that solves a series of linear equations, instead of solving complex QPPs as in MPTSVM. This algorithm also deals with high-dimensional and large data sets, and show great flexibility in modeling diverse sources of data.

Angulo et al. [27] proposed a new algorithm of classification for multi-class problems, termed K-SVCR (Support Vector Classification-Regression for K-class). This learning algorithm with ternary outputs ${- 1, 0, + 1}$ is based on Vapnik’s support vector theory, and evaluates all training samples into a $` ` 1 - v e r s u s - 1 - v e r s u s - r e s t^{″}$ structure during the decomposing phase by using a mixed classification and regression support vector machine. Compared with other classification algorithms for multi-class, the K-SVCR yields greater generalization performance, since all samples are used for the construction of the classification hyperplane. One extension of K-SVCR for multi-class TWSVM, termed Twin-KSVC, was proposed by Xu et al. [6]. Comparing the two algorithms in public data sets obtained in [28], it can be seen that in all cases the Twin-KSVC algorithm outperformed the K-SVCR, both in accuracy, and computational cost.

Recently, Nasiri et al. [23], motivated by studies [6], [10], [14], [27], proposed a version of least squares for Twin-KSVC, a shortened form by LSTKSVC, proving its effectiveness compared to K-SVCR, TWSVM, Twin-KSVC and LSTKSVC.

Motivated by the studies of [6], [23], [27], we propose Improvements on least squares twin multi-class classification support vector machine (Improvements on LSTKSVC) in this paper. The experimental results on UCI data sets show that the proposed Improvements on LSTKSVC algorithm resulted in higher classification accuracy with a computational time similar to that compared in some sources found in the literature. The following are the highlights of our Improvements on LSTKSVC:

•
The Sherman–Morrison–Woodbury (SMW) formulation is employed to reduce the complexity of nonlinear Improvements on LSTKSVC.
•
The solution of our Improvements on LSTKSVC requires solving two systems of linear equations, different from KSVCR and Twin-KSVC that require solving two QPPs.
•
Our proposed algorithm evaluates all the training points into a $` ` 1 - v e r s u s - 1 - v e r s u s - r e s t "$ structure, similarly to KSVCR, Twin-KSVC and LSTKSVC.

One main challenge was finding the best parameters for the grid search method; therefore, to reduce the computational complexity of parameter selection we set $c_{1} = c_{3}, c_{2} = c_{4}$ and $c_{5} = c_{6}$ in our algorithm.

The remainder of this paper is organized as follows: Section (2) outlines the SVM, TWSVM, K-SVCR and LSTKSVC and introduces the notation used in the rest of the paper. Section (3) introduces the linear and non-linear Improvements on LSTKSVC. Section (4) deals with experimental results on ten benchmark data sets. Section (5) contains concluding remarks.

Section snippets

Support vector machine

SVMs represent a learning technique that have been introduce in the framework of structural risk minimization (SRM) and in the theory of Vapnik–Chervonenkis bounds [3]. It is a state-of-the-art of machine learning algorithm. Among several tutorials on SVM literature we refer to [29], [30], [31].

Given m training pairs $(x_{1}, y_{1}), \dots, (x_{m}, y_{m}),$ where $x_{i} \in R^{n},$ with $i = 1, 2, \dots m,$ is an input vector labeled by $y_{i} \in {- 1, + 1},$ the linear SVM classifier search for an optimal separating hyperplane $w^{T} x + b = 0$ where $w \in R^{n}$ is

Improvements on least squares twin multi-class classification support vector machine

In this section, motivated by the study developed by Nasiri et al. [23] that propose LSTKSVC, we proposed our Improvements on least squares twin multi-class classification support vector machine. The principle of minimization of structural risk is implemented by introducing a regularization term as was done by Shao et al. [11]. Fig. 2 illustrates the Improvements on LSTKSVC. The present algorithm evaluates the training points in a structure $` ` 1 - v e r s u s - 1 - v e r s u s - r e s t^{″}$ . Let matrix $A \in R^{m_{1} \times n},$

Numerical experiments

To evaluate the performance of our improvements on LSTKSVC, in this section, we investigate its performance on ten benchmark data sets from the UCI machine repository [28] described in Table 1. The samples are normalized before learning such that the features are located in the [0, 1] range. In our implementation, we focus on the comparison among Twin-KSVC, LSTKSVC and improvements on LSTKSVC.

All the experiments have been implemented in Matlab 2015b on a personal computer (PC) with Intel

Conclusion

In this paper, we have proposed an algorithm, termed as Improvements on LSTKSVC, that implement the structural risk minimization (SRM) principle by introducing a regularization term in the LSTKSVC version to multi-class classification problem, improving the generalization ability. Similarly to the LSTKSCV and Twin-KSVC, our algorithm evaluates all the training data into a $` ` 1 - v e r s u s - 1 - v e r s u s - r e s t^{″}$ structure, so it generates ternary output ${- 1, 0, + 1}$ . Differently from Twin-KSVC that need to solve

Acknowledgments

The authors would like to thank the editor and anonymous reviewers whose valuable comments and feedback have helped us to improve the content and presentation of the paper.

Márcio Lima received his bachelor’s degree in mathematics in 2001 and M.Sc. degree in Mathematics in 2011. Now he is a Ph.D. student in College of computer Science at Universidade Federal de Goiás, Goiânia, Brazil. His research interests include data mining, optimization methods, support vector machine and statistical methods. He is a professor at Federal Institute of Education, Science and Technology of Goiás, Goiânia, Brazil.

References (38)

Z. Qi et al.
Robust twin support vector machine for pattern classification
Pattern Recognit.
(2013)
X. Chen et al.
Recursive projection twin support vector machine via within-class variance minimization
Pattern Recognit.
(2011)
X. Peng
Tsvr: an efficient twin support vector machine for regression
Neural Netw.
(2010)
Y.-H. Shao et al.
Least squares recursive projection twin support vector machine for classification
Pattern Recognit.
(2012)
J.A. Nasiri et al.
Least squares twin multi-class classification support vector machine
Pattern Recognit.
(2015)
X. Zhang et al.
An improved multiple birth support vector machine for pattern classification
Neurocomputing
(2017)
C. Angulo et al.
K-svcr. a support vector machine for multi-class classification
Neurocomputing
(2003)
Y. Xu
K-nearest neighbor-based weighted multi-class twin support vector machine
Neurocomputing
(2016)
S. Ding et al.
Wavelet twin support vector machines based on glowworm swarm optimization
Neurocomputing
(2017)
B.E. Boser et al.
A training algorithm for optimal margin classifiers
Proceedings of the Fifth Annual Workshop on Computational Learning Theory
(1992)

C. Cortes et al.

Support-vector networks

Mach. Learn.

(1995)

V. Vapnik

The Nature of Statistical Learning Theory

(2013)

V. Vapnik

Statistical Learning Theory

(1998)

B.D. Ripley

Pattern Recognition and Neural Networks

(2007)

Y. Xu et al.

A twin multi-class classification support vector machine

Cognit. Comput.

(2013)

Z.-X. Yang et al.

Multiple birth support vector machine for multi-class classification

Neural Comput. Appl.

(2013)

O.L. Mangasarian et al.

Multisurface proximal support vector machine classification via generalized eigenvalues

IEEE Trans. Pattern Anal. Mach. Intell.

(2006)

Z.-M. Yang et al.

Least squares recursive projection twin support vector machine for multi-class classification

Int. J. Mach. Learn. Cybernet.

(2016)

Jayadeva et al.

Twin support vector machines for pattern classification

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

Cited by (41)

Sparse solution of least-squares twin multi-class support vector machine using ℓ<inf>0</inf> and ℓ<inf>p</inf>-norm for classification and feature selection
2023, Neural Networks
In the realm of multi-class classification, the twin K-class support vector classification (Twin-KSVC) generates ternary outputs ${- 1, 0, + 1}$ by evaluating all training data in a “1-versus-1-versus-rest” structure. Recently, inspired by the least-squares version of Twin-KSVC and Twin-KSVC, a new multi-class classifier called improvements on least-squares twin multi-class classification support vector machine (ILSTKSVC) has been proposed. In this method, the concept of structural risk minimization is achieved by incorporating a regularization term in addition to the minimization of empirical risk. Twin-KSVC and its improvements have an influence on classification accuracy. Another aspect influencing classification accuracy is feature selection, which is a critical stage in machine learning, especially when working with high-dimensional datasets. However, most prior studies have not addressed this crucial aspect. In this study, motivated by ILSTKSVC and the cardinality-constrained optimization problem, we propose $ℓ_{p}$ -norm least-squares twin multi-class support vector machine (PLSTKSVC) with $0 < p < 1$ to perform classification and feature selection at the same time. The technique employed to solve the optimization problems associated with PLSTKSVC is user-friendly, as it involves solving systems of linear equations to obtain an approximate solution for the proposed model. Under certain assumptions, we investigate the properties of the optimum solutions to the related optimization problems. Several real-world datasets were tested using the suggested method. According to the results of our experiments, the proposed method outperforms all current strategies in most datasets in terms of classification accuracy while also reducing the number of features.
Multicycle disassembly-based decomposition algorithm to train multiclass support vector machines
2023, Pattern Recognition
Employing the classic optimization solver to train a multiclass support vector machine (SVM) requires prohibitive training time as the sample size and number of categories increase. It has been proposed to develop the corresponding decomposition algorithm (DA) as it is efficient for training SVMs. However, the dual problem of multiclass SVM comprises complex constraints that complicate DA design, so no corresponding DA has yet been developed. We propose a multicycle disassembly-based DA (MCD-DA) to efficiently solve the training problem of multiclass SVM. First, a graph model is constructed to re-express the constraints in multiclass SVM. Then, the original complex feasible region is partitioned into several simple sub-feasible regions, and multiple cycle-based disassembly strategies are designed to update the working variables analytically within each specific sub-feasible region. We mathematically verify that MCD-DA can stop within a finite number of cycle disassemblies and reach the $τ$ -optimal solution satisfying relaxed Karush–Kuhn–Tucker conditions. Remarkably, MCD-DA as a universal decomposition algorithm can be used to solve many other SVM variants, including C-SVM, v-SVM, and one-class SVM. Experimental results using six UCI datasets demonstrate that MCD-DA outperforms typical optimization algorithms for more sample cases.
Solution path algorithm for twin multi-class support vector machine
2022, Expert Systems with Applications
Citation Excerpt :
Since only two linear equations are considered to obtain the result, the solution speed is improved a lot instead of solving two QPPs with constraints. Based on LSTSVM, many researchers have proposed different improved versions (de Lima et al., 2018; Tanveer et al., 2016; Xu et al., 2015). To solve the semi-positive definite problem in LSTSVM, which only satisfies the empirical risk minimization, Tanveer et al. (2016) have designed a robust energy-based LSTSVM.
The twin support vector machine and its extensions have made great achievements in dealing with binary classification problems. However, it suffers from difficulties in effective solution of multi-classification and fast model selection. This work devotes to the fast regularization parameter tuning algorithm for the twin multi-class support vector machine. Specifically, a novel sample data set partition strategy is first adopted, which is the basis for the model construction. Then, combining the linear equations and block matrix theory, the Lagrangian multipliers are proved to be piecewise linear w.r.t. the regularization parameters, so that the regularization parameters are continuously updated by only solving the break points. Next, Lagrangian multipliers are proved to be 1 as the regularization parameter approaches infinity, thus, a simple yet effective initialization algorithm is devised. Finally, eight kinds of events are defined to seek for the starting event for the next iteration. Extensive experimental results on nine UCI data sets show that the proposed method can achieve comparable classification performance without solving any quadratic programming problem.
Visual-simulation region proposal and generative adversarial network based ground military target recognition
2022, Defence Technology
Citation Excerpt :
Compared with the latter, digital image sensors have lower cost, longer detection range and richer features captured. Modern digital image-based ground military target detection methods can be divided into human-engineering-features-based methods (e.g., Histogram of Oriented Gradient (HOG) [13], Scale-Invariant Feature Transform (SIFT) [14], Support Vector Machine (SVM) [15]) and deep-learning-based methods. Human-engineering features are easier to be understood and computation efficient, but the methods based on these features are often poor in robustness and can be only adapt to simple scenario.
Ground military target recognition plays a crucial role in unmanned equipment and grasping the battlefield dynamics for military applications, but is disturbed by low-resolution and noisy-representation. In this paper, a recognition method, involving a novel visual attention mechanism-based Gabor region proposal sub-network (Gabor RPN) and improved refinement generative adversarial sub-network (GAN), is proposed. Novel central–peripheral rivalry 3D color Gabor filters are proposed to simulate retinal structures and taken as feature extraction convolutional kernels in low-level layer to improve the recognition accuracy and framework training efficiency in Gabor RPN. Improved refinement GAN is used to solve the problem of blurry target classification, involving a generator to directly generate large high-resolution images from small blurry ones and a discriminator to distinguish not only real images vs. fake images but also the class of targets. A special recognition dataset for ground military target, named Ground Military Target Dataset (GMTD), is constructed. Experiments performed on the GMTD dataset effectively demonstrate that our method can achieve better energy-saving and recognition results when low-resolution and noisy-representation targets are involved, thus ensuring this algorithm a good engineering application prospect.
TSVM-M<sup>3</sup>: Twin support vector machine based on multi-order moment matching for large-scale multi-class classification
2022, Applied Soft Computing
Citation Excerpt :
Because all the mentioned methods are verified in large-scale datasets, so in this subsection, to reduce computational complexity, this paper only compares our proposed TSVM-M3 with the least square loss-based methods. So, this paper designs the following comprehensive experiments compared with various methods mentioned above including the MLSTSVM [28], MWLTSVM [36], LST-KSVC [34], RLSSVC [47], reg-LSDWPTSVM [9]. Also, our proposed TSVM-M3 is divided into two versions, one is TSVM-M3-I which represents that TSVM-M3 speeds up by the traditional rectangular kernel technique, the other is TSVM-M3-II which represents that TSVM-M3 speeds up by the newly designed rectangular kernel technique.
For multi-class classification, many existing methods, such as multiple weighted linear loss twin support vector machine (MWLTSVM), construct multiple decision hyperplanes by minimizing the positive points loss’s first-order moment (mean), which may lead to sensitivity to outliers. Also, when faced with a large-scale classification problem, how to speed up the process of solving the optimization model is also a challenge. An alternative is to use rectangular kernel technology (RKT) to reduce computational complexity. However, RKT is based on the uniform point selection method, which can be proven to be ineffective in improving classifier performance. To address these problems, a novel classifier under the structure of “one-versus-rest” for multi-class classification is proposed in this paper, named twin support vector machine based on multi-order moment matching (TSVM-M³). When constructing the decision hyperplanes, TSVM-M³ takes the first-order and second-order moments (mean and variance) of positive points loss into consideration and implements this by introducing an adjusting factor into the objective function. A theoretical analysis of the robustness of the proposed TSVM-M³ is also provided. Meanwhile, a novel RKT based on the density-dependent data selection method is proposed for large-scale classification. We demonstrate that the proposed RKT can benefit from reducing modeling error. Experimental results show the effectiveness of the proposed TSVM-M³.
Multi-class fuzzy support matrix machine for classification in roller bearing fault diagnosis
2022, Advanced Engineering Informatics
As a new classification method with the matrix as the input, support matrix machine (SMM) makes full use of the structured information between rows and columns of the input matrix to establish an accurate prediction model, which has been widely used in the field of fault diagnosis. However, the principle of SMM is to construct two parallel hyperplanes to complete the segmentation between different types of samples. When there are noise and outliers in the sample data, it is difficult for SMM to construct an ideal parallel hyperplane. In view of this, this paper proposes a multi-class fuzzy support matrix machine (MFSMM) by establishing nonparallel hyperplane objective function and integrating fuzzy attributes. In MFSMM, MFSMM establishes two nonparallel fuzzy hyperplanes by objective function, which maximizes the interval between any two fuzzy hyperplanes while considering the sample structure information. Meanwhile, fuzzy plane assigns different membership degrees to different training samples, which greatly reduces the influence of noise on the construction of optimal classification hyperplane. By analyzing two kinds of roller bearing experimental data, the results show that MFSMM has higher classification accuracy and stronger fault tolerance for samples with uncertain information.

View all citing articles on Scopus

Nattane Luíza da Costa received the M.Sc. degree in Computer Science in 2016 from the Universidade Federal de Goiás, Brazil. She is currently a Ph.D. student at the Department of Computer Science, Universidade Federal de Goiás, Brazil. Her field of research includes data mining, data preprocessing, feature selection and big data.

Rommel Barbosa is a doctor of Computer Science. He is associate professor at Universidade Federal de Goiás, Brazil. His field of research includes data mining, machine learning, feature selection and big data.

View full text

Improvements on least squares twin multi-class classification support vector machine

Abstract

Introduction

Section snippets

Support vector machine

Improvements on least squares twin multi-class classification support vector machine

Numerical experiments

Conclusion

Acknowledgments

Pattern Recognit.

Pattern Recognit.

Neural Netw.

Pattern Recognit.

Pattern Recognit.

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

A training algorithm for optimal margin classifiers

Proceedings of the Fifth Annual Workshop on Computational Learning Theory

Support-vector networks

Mach. Learn.

The Nature of Statistical Learning Theory

Statistical Learning Theory

Pattern Recognition and Neural Networks

A twin multi-class classification support vector machine

Cognit. Comput.

Multiple birth support vector machine for multi-class classification

Neural Comput. Appl.

Multisurface proximal support vector machine classification via generalized eigenvalues

IEEE Trans. Pattern Anal. Mach. Intell.

Least squares recursive projection twin support vector machine for multi-class classification

Int. J. Mach. Learn. Cybernet.

Twin support vector machines for pattern classification

IEEE Trans. Pattern Anal. Mach. Intell.