Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification

doi:10.1016/j.patcog.2017.02.011

Pattern Recognition

Volume 67, July 2017, Pages 32-46

https://doi.org/10.1016/j.patcog.2017.02.011 Get rights and content

Highlights

•
This paper presents a weighted linear loss multiple birth support vector machine based on information granulation (GWLMBSVM).
•
GWLMBSVM divides the data into several granules and builds a set of classifiers in the mixed granules.
•
By introducing the weighted linear loss, the proposed GWLMBSVM only needs to solve simple linear equations.
•
The overall computational complexity of GWLMBSVM is lower than multi-class WLTSVM classifier.

Abstract

Recently proposed weighted linear loss twin support vector machine (WLTSVM) is an efficient algorithm for binary classification. However, the performance of multiple WLTSVM classifier needs improvement since it uses the strategy ‘one-versus-rest’ with high computational complexity. This paper presents a weighted linear loss multiple birth support vector machine based on information granulation (WLMSVM) to enhance the performance of multiple WLTSVM. Inspired by granular computing, WLMSVM divides the data into several granules and builds a set of sub-classifiers in the mixed granules. By introducing the weighted linear loss, the proposed approach only needs to solve simple linear equations. Moreover, since WLMSVM uses the strategy “all-versus-one” which is the key idea of multiple birth support vector machine, the overall computational complexity of WLMSVM is lower than that of multiple WLTSVM. The effectiveness of the proposed approach is demonstrated by experimental results on artificial datasets and benchmark datasets.

Introduction

Standard support vector machine (SVM) [1], which is based on the statistical learning theory [2] and the Vapnik–Chervonenkis (VC) dimension, classifies 2-category points by assigning them to one of two disjoint half spaces. SVM has drawn extensive attention of scholars [3], [4], [5], [6], [7], [8] and has been applied to many fields successfully [9], [10], [11], [12], [13]. Twin support vector machine (TWSVM) [14], as an excellent extension of SVM, generates two nonparallel hyperplanes such that each plane is close to one of the two classes and as far as possible from the other class. TWSVM assigns a new sample to one of the classes depending on which hyperplane the new sample is closer to. An illustrative diagram of the thought of TWSVM in 2-dimensional space is shown in Fig. 1. TWSVM solves two small-scale quadratic programming problems (QPPs), whereas SVM solves one single QPP with a large number of constraints. Because of the strategy, TWSVM is almost four times faster than standard SVM [15]. In the last several years, TWSVM has been studied extensively and greatly generalized [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27]. Recently, Shao et al. [28] proposed a novel extension of TWSVM, called weighted linear loss twin support vector machine (WLTSVM). Different from TWSVM, WLTSVM solves linear equations. The two systems of linear equations in WLTSVM for binary classification can be solved efficiently by using the well-known conjugate gradient algorithm, resulting in the ability to deal with large-scale datasets without any extra external optimizers. Many pattern recognition problems in real world are multi-class classification problems [29], [30], [31], [32], [33], [34], [35], [36], [37], [38]. WLSTVM has also been extended to multi-class classification problems. However, multiple WLTSVM uses the strategy “one-versus-rest” with high computational complexity. Multiple WLTSVM builds a binary WLTSVM classifier for each class. Each binary WLTSVM in multiple WLTSVM is constructed by considering samples in one of the classes as positive samples and the rest as negative samples and training them. Multiple WLTSVM does not keep the advantages of WLTSVM that has high performance and low computational complexity. Multiple birth support vector machine (MBSVM) [39] is another novel extension of TWSVM with high performance. MBSVM uses the strategy “all-versus-one”. The strategy “all-versus-one” considers one of the classes as negative class and all the rest classes as positive class in turn to generate a serious of binary sub-classifiers to solve the multi-class classification problem. However, MBSVM needs to deal with a serious of QPPs.

Several multi-class TWSVMs have been proposed. The strategies that can be used to extend binary TWSVMs to multi-class TWSVMs include: one-versus-rest, one-versus-one, one-versus-one-versus-rest, binary tree, rest-versus-one, directed acyclic grape (DAG). The strategy one-versus-rest is easy to understand and implement. However, the complexity of one-versus-rest based methods is high. In general, one-versus-one based multi-class TWSVMs and DAG based multi-class TWSVMs always get better classification accuracies than other methods. However, they need to build a large number of sub-classifiers. When the number of classes is big, they are complex systems. The complexity of one-versus-one-rest based methods is higher than one-versus-rest based methods. The binary TWSVMs in rest-versus-one based methods take one class as the negative class and the rest classes as the positive class, so the numbers of constrains are small. The complexity related to the number of constrains directly. Compared with other approaches, the advantage of rest-versus-one based methods is the lower complexity. In this paper, we employ the rest versus one strategy to reduce the time complexity.

Granular computing [40], [41], [42], covering all the research about theories, methods, techniques and tools of granulation, is a powerful method to handle large scale information. The essence of granular computing is to find an approximate solution, which is simple and low-cost, to replace the exact solution through using inaccurate and large scale information to achieve the tractability, robustness, low cost and better describing the real world of intelligent systems or intelligent control. The combination of granular computing with statistical learning theory is becoming a hotspot. Many effective granular SVM (GSVM) models for binary classification have been developed [41]. Wang et al. [44] proposed a GSVM model based on mixed measure; Ding et al. [45] proposed a fast fuzzy support vector machine based on information granulation; Cheng et al. [46] proposed a dynamic GSVM. However, combination of granular computing with extensions of TWSVM for multi-class classification is still an unsolved research problem.

This paper proposes a new classifier for multi-class classification, called weighted linear loss multiple birth support vector machine based on information granulation (WLMSVM) to enhance the performance of multiple WLTSVM. The proposed algorithm works as follows. Firstly, it splits the whole feature space into a set of information granules depending on the training data and signs the information granules into “pure granule” or “mixed granule” depending on the labels of training samples in them. A pure granule is a subspace in which there are only samples with same class label. A mixed granule is a subspace in which two or more classes present. Then, WLMSVM builds one multi-class sub-classifier in each mixed granule. Different from multiple WLTSVM, our approach uses the strategy “all-versus-one” which is the key idea of MBSVM. In a given mixed granule, the sub-classifier of WLMSVM generates one hyperplane for each class by solving a TWSVM-style QPP which considers the samples in one class as negative samples and all other samples in the mixed granule as positive samples. The last step is to predict the label of an unlabeled sample. Compared with other methods for multi-class classification, the proposed approach has three advantages. 1) By introducing the strategy “all-versus-one”, our approach as a whole has low computational complexity. Especially when the number of classes is large, WLMSVM can always work faster than most of the other methods; 2) WLMSVM keeps the advantage of the multiple WLTSVM classifier. WLMSVM uses a weighted linear loss instead of hinge loss. The use of weighted linear loss leads to that WLMSVM only needs to solve several systems of linear equations; 3) Granular computing technique frees each sub-classifier to focus on the local information of data in the granules.

The rest of this article is organized as follows. In the next section, a brief review to TWSVM, WLTSVM, MBSVM and granular computing is provided. In Section 3, we introduce the proposed weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification in detail. Experimental results are given in Section 4. In the last section, concluding remarks and further research to be developed are presented.

Section snippets

Twin support vector machine

Assume a binary classification problem with m samples in the n-dimensional real space Rⁿ. The set of training data points is represented by T = {(x_i, y_i) | i = 1,2,3,…,m}, where x_i is input sample and y_i ϵ {+1,−1} is corresponding output. Let m₁ × n matrix A denote the samples belonging to class +1 and m₂ × n matrix B denote the samples belonging to class −1. Each row of A is a sample belonging to class +1, and each row of B is a sample belonging to class −1.

For a linear binary classification

WLMSVM

In this section, we present a weighted linear loss multiple birth support vector machine based on information granulation. The approach can be divided into three steps: The first step is to split the feature space and build suitable information granules. The second step is to train sub-classifiers in mixed granules and combine them into WLMSVM as a whole. The last step is to predict a new sample. The specific flow of WLMSVM is shown in Fig. 2.

Experiments and analysis

In order to test the performance of the proposed algorithm, we do a series of tests on three artificial datasets and several popular UCI datasets. All experiments are implemented on a personal computer with Intel (R) 3.4 GHz Intel Core i5 CPU, 4 G memory and MATLAB 2012a environment. If no other special instructions, in this article, we use 10-fold cross-validation methods and select the average accuracy to measure the classification accuracy. For the nonlinear case, these problems are tested

Conclusions

In order to further improve the performance of Multiple WLTSVM and reduce the complexity of the method as a whole, this paper propose weighted linear loss multiple birth support vector machine based on information granulation. The proposed approach for multi-class classification is based on weighted linear loss and the strategy “all-versus-one”. WLMSVM first splits the feature space into several information granules and then trains a sub-classifier with weighted linear loss in each mixed

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos. 61379101, 61672522), the National Key Basic Research Program of China (No. 2013CB329502), the Priority Academic Program Development of Jiangsu Higer Education Institutions(PAPD), and the Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology(CICAEET).

Shifei Ding, born in Qingdao, received his Ph.D. degree from Shandong University of Science and Technology in 2004. He received postdoctoral degree from Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, and Chinese Academy of Sciences in 2006. He is a professor and Ph.D. supervisor at China University of Mining and Technology as His research interests include intelligent information processing, pattern recognition, machine learning, data mining, and

References (55)

X. Song et al.
Unsupervised spatiotemporal fMRI data analysis using support vector machines
NeuroImage
(2009)
F. Camastra
A SVM-based cursive character recognizer
Pattern Recognit.
(2007)
B. Gu et al.
Incremental learning for ν-support vector regression
Neural Netw.
(2015)
Y.H. Shao et al.
An efficient weighted Lagrangian twin support vector machine for imbalanced data classification
Pattern Recognit.
(2014)
WangY. et al.
Local and global regularized twin SVM
Procedia Comput. Sci.
(2013)
M.A. Kumar et al.
Application of smoothing technique on twin support vector machines
Pattern Recognit. Lett.
(2008)
X. Peng et al.
Improvements on twin parametric-margin support vector machine
Neurocomputing
(2015)
X. Xie et al.
Multitask centroid twin support vector machines
Neurocomputing
(2015)
X. Peng et al.
A twin-hypersphere support vector machine classifier and the fast learning algorithm
Inf. Sci.
(2013)
S. Mehrkanoon et al.
Non-parallel support vector classifiers with different loss functions
Neurocomputing
(2014)

X. Peng

TPMSVM: a novel twin parametric-margin support vector machine for pattern recognition

Pattern Recognit.

(2011)

X. Hua et al.

Weighted least squares projection twin support vector machines with local information

Neurocomputing

(2015)

Y.H. Shao et al.

Weighted linear loss twin support vector machine for large-scale classification

Knowl.-Based Syst.

(2015)

J.A. Nasiri et al.

Least squares twin multi-class classification support vector machine

Pattern Recognit.

(2015)

X. Ju et al.

Nonparallel hyperplanes support vector machine for multi-class classification

Procedia Comput. Sci.

(2015)

M. Chu et al.

Multi-class classification methods of enhanced LS-TWSVM for strip steel surface defects

J. Iron Steel Res. Int.

(2014)

Y.H. Shao et al.

The best separating decision tree twin support vector machine for multi-class classification

Procedia Comput. Sci.

(2013)

Y. Tang et al.

Granular support vector machines with association rules mining for protein homology prediction

Artif. Intell. Med.

(2005)

F. Fernández-Navarro et al.

A dynamic over-sampling procedure based on sensitivity for multi-class problems

Pattern Recognit.

(2011)

Y. Sun et al.

Cost-sensitive boosting for classification of imbalanced data

Pattern Recognit.

(2007)

C. Huang et al.

A GA-based feature selection and parameters optimization for support vector machines

Expert Syst. Appl.

(2006)

WangZ. et al.

A GA-based model selection for smooth twin parametric-margin support vector machine

Pattern Recognit.

(2013)

C. Cortes et al.

Support vector networks

Mach. Learn.

(1995)

V.N. Vapnik

The Nature of Statistical Learning Theory

(1998)

J. Suykens et al.

Least square support vector machine classifiers

Neural Process. Lett.

(1999)

Y. Feng et al.

Normalization of Linear Support Vector Machines

IEEE Trans. Signal Process.

(2015)

B Gu et al.

Incremental support vector learning for ordinal regression

IEEE Trans. Neural Netw. Learn. Syst.

(2015)

Cited by (78)

A general maximal margin hyper-sphere SVM for multi-class classification
2024, Expert Systems with Applications
Traditional SVM algorithms for multi-class (k > 2 classes) classification tasks include “one-against-one”, “one-against-rest”, and “one-against-one-against-rest”, which build k(k−1)/2 or k classifiers for space partitioning and classification decision. However, they may cause a variety of problems, such as an imbalanced problem, a high temporal complexity, and trouble establishing the decision boundary. In this study, we use the notion of minimizing structural risks (SRM) to recognize k classes by designing only one optimization problem, which we call M³HS-SVM. The M³HS-SVM offers numerous benefits. In summary, the following points should be emphasized: (1) Rather than dividing the space with hyper-planes, M³HS-SVM describes the structural characteristics of various classes of data and trains the hyper-sphere classifier of each class based on the data distribution. (2) M³HS-SVM inherits all of the advantages of classical binary SVM, such as the maximization spirit, the use of kernel techniques to solve nonlinear separable problems, and excellent generalization ability. (3) In the dual problem, we develop an SMO algorithm to effectively reduce the complexity of time and space. We eventually validate the preceding statement with comprehensive experiments. The experiment findings show that our method outperforms other mainstream methods in terms of computing time and classification performance on synthetic datasets, UCI datasets, and NDC datasets.
An improved hybrid chameleon swarm algorithm for feature selection in medical diagnosis
2023, Biomedical Signal Processing and Control
Feature selection (FS) is generally associated with the process of using a probabilistic method to select optimal feature combinations during pre-processing steps in data mining. This technique can optimize the datasets’ features that need to be considered to heighten the performance of classification on the grounds of the selected optimal feature set. In this paper, a hybridization model is evolved and applied to select the optimal feature subset based on a binary version of the Hybrid Memory Improved Chameleon Swarm Algorithm (CSA) (HMICSA) and the k-Nearest Neighbor (k-NN) classifier. In this FS model, the following are proposed and applied: (1) Four kinds of transfer functions, (2) Amendments to the velocity of the CSA’s individuals, (3) Addition of internal memory to the CSA’s individuals, and (4) Hybridization of CSA with Ali baba and the Forty Thieves (AFT) algorithm. These actions are aimed to strike an adequate equipoise between global exploration and local exploitation conducts of the search space of the basic CSA. This is to mitigate the problem of early convergence, and to sidestep trapping into a local optima in CSA. The efficacy of the proposed FS algorithm was evaluated on 24 medical diagnosis benchmark datasets collected from different specialized repositories and compared with other k-NN-based FS methods. The all-inclusive outcomes using various evaluation methods disclose the competence of the proposed method in augmenting the classification performance compared to other methods, ensuring its potential in scouting the feature space and designating the most useful features for classification tasks.
Local-to-Global Support Vector Machines (LGSVMs)
2022, Pattern Recognition
For supervised classification tasks that involve a large number of instances, we propose and study a new efficient tool, namely the Local-to-Global Support Vector Machine (LGSVM) method. Its background somehow lies in the framework of approximation theory and of local kernel-based models, such as the Partition of Unity (PU) method. Indeed, even if the latter needs to be accurately tailored for classification tasks, such as allowing the use of the cosine semi-metric for defining the patches, the LGSVM is a global method constructed by gluing together the local SVM contributions via compactly supported weights. When the number of instances grows, such a construction of a global classifier enables us to significantly reduce the usually high complexity cost of SVMs. This claim is supported by a theoretical analysis of the LGSVM and of its complexity as well as by extensive numerical experiments carried out by considering benchmark datasets.
Online Adaptive Kernel Learning with Random Features for Large-scale Nonlinear Classification
2022, Pattern Recognition
In the field of support vector machines, online random feature map algorithms are very important methods for large-scale nonlinear classification problems. At present, the existing methods have the following shortcomings: (1) If only the hyperplane vector is updated during learning while the random feature components are fixed, there is no guarantee that these online methods can adapt to the change of data distribution shape when the data is coming one by one. (2) When the kernel is selected improperly, the samples mapped to an inappropriate space may not be well classified. In order to overcome these shortcomings, considering the fact that iteratively updating random feature components can make data better fit in the current space and lead to the flexible adjustment of the kernel function, random features based online adaptive kernel learning (RF-OAK) is proposed for large-scale nonlinear classification problems. Theoretical analysis of the proposed algorithm is also provided. The experimental results and the Wilcoxon signed-ranks test show that in terms of test accuracy, the proposed method is significantly better than the state-of-the-art online feature mapping classification methods. Compared with the deep learning algorithms, the training time of RF-OAK is shorter. In terms of test accuracy, RF-OAK is better than online algorithm and comparable with offline algorithms.
TSVM-M<sup>3</sup>: Twin support vector machine based on multi-order moment matching for large-scale multi-class classification
2022, Applied Soft Computing
Citation Excerpt :
An effective way is to substitute hinge loss in the objective function of SVM and TSVM-based methods with the quadratic loss [34,35] or the weighted linear loss [36] so that the final decision hyperplane can be obtained by only solving several systems of linear equations [32,34,36]. Some methods also consider dividing the training process into several steps, each step trains part of the samples, thus reducing the computation complexity [37,38]. One disadvantage of these methods is that they do not fully use all the training data during each step.
For multi-class classification, many existing methods, such as multiple weighted linear loss twin support vector machine (MWLTSVM), construct multiple decision hyperplanes by minimizing the positive points loss’s first-order moment (mean), which may lead to sensitivity to outliers. Also, when faced with a large-scale classification problem, how to speed up the process of solving the optimization model is also a challenge. An alternative is to use rectangular kernel technology (RKT) to reduce computational complexity. However, RKT is based on the uniform point selection method, which can be proven to be ineffective in improving classifier performance. To address these problems, a novel classifier under the structure of “one-versus-rest” for multi-class classification is proposed in this paper, named twin support vector machine based on multi-order moment matching (TSVM-M³). When constructing the decision hyperplanes, TSVM-M³ takes the first-order and second-order moments (mean and variance) of positive points loss into consideration and implements this by introducing an adjusting factor into the objective function. A theoretical analysis of the robustness of the proposed TSVM-M³ is also provided. Meanwhile, a novel RKT based on the density-dependent data selection method is proposed for large-scale classification. We demonstrate that the proposed RKT can benefit from reducing modeling error. Experimental results show the effectiveness of the proposed TSVM-M³.
KNN weighted reduced universum twin SVM for class imbalance learning
2022, Knowledge-Based Systems
In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. The classification models like support vector machines (SVM) get biased to majority class samples and hence results in misclassification of the minority class samples. SVM suffers as no prior information related to the data is involved in the generation of hyperplanes. Also, local information of the neighbourhood is ignored in SVM samples and thus treats each sample equally for generating the hyperplanes. However, the data points may be contaminated and may mislead the generation of hyperplanes. Inspired by the idea of prior data information and local neighbourhood information, we propose $K$ -nearest neighbour based weighted reduced universum twin SVM for class imbalance learning (KWRUTSVM-CIL). The proposed KWRUTSVM-CIL embodies the local neighbourhood information and uses universum data to balance the classes in class imbalance problems. Local neighbourhood information is incorporated via weight matrix in the objective function. In proposed KWRUTSVM-CIL model, weight vectors are used in the corresponding constraints of the objective functions to exploit the interclass information. The oversampling and undersampling approaches are followed to balance the data in class imbalance problems. Universum data gives prior information of the data. Twin SVM, universum twin SVM, and reduced universum twin SVM for class imbalance implement empirical risk minimization principle and thus may lead to overfitting. However, the proposed KWRUTSVM-CIL model embodies regularization term to maximize the margin and implement the structural risk minimization principle which is the marrow of statistical learning and overcomes the issues of overfitting. Experimental results and the statistical analysis signify that the generalization ability of proposed KWRUTSVM-CIL model is superior in comparison to other twin SVM based models. As an application, we use the proposed KWRUTSVM-CIL model for the diagnosis of Alzheimer’s disease and breast cancer disease. The proposed KWRUTSVM-CIL model showed better generalization performance compared to other twin SVM based models in biomedical datasets.

View all citing articles on Scopus

Xiekai Zhang, received his B.Sc. degree in computer science in China University of Mining and Technology in 2013, and is currently pursuing the M.Sc. degree in the School of Computer Science and Technology, China University of Mining and Technology. His research interest includes machine learning, pattern recognition, support vector machine, and various applications.

Xuanyue An, received her B.Sc. degree in computer science in Jiangsu Normal University in 2015, and is currently pursuing the M.Sc. degree in the School of Computer Science and Technology, China University of Mining and Technology. His research interest includes machine learning, pattern recognition, support vecter machine, and various applications.

Yu Xue, received his B.Sc. degree in computer science in China University of Mining and Technology in 2013, and is currently pursuing the Ph.D. degree in the School of Computer Science and Technology, China University of Mining and Technology. His research interest includes machine learning, pattern recognition, support vector machine, and various applications.

View full text

Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification

Highlights

Abstract

Introduction

Section snippets

Twin support vector machine

WLMSVM

Experiments and analysis

Conclusions

Acknowledgments

NeuroImage

Pattern Recognit.

Neural Netw.

Pattern Recognit.

Procedia Comput. Sci.

Pattern Recognit. Lett.

Neurocomputing

Neurocomputing

Inf. Sci.

Neurocomputing

Pattern Recognit.

Neurocomputing

Knowl.-Based Syst.

Pattern Recognit.

Procedia Comput. Sci.

J. Iron Steel Res. Int.

Procedia Comput. Sci.

Artif. Intell. Med.

Pattern Recognit.

Pattern Recognit.

Expert Syst. Appl.

Pattern Recognit.

Support vector networks

Mach. Learn.

The Nature of Statistical Learning Theory

Least square support vector machine classifiers

Neural Process. Lett.

Normalization of Linear Support Vector Machines

IEEE Trans. Signal Process.

Incremental support vector learning for ordinal regression

IEEE Trans. Neural Netw. Learn. Syst.