Elsevier

Pattern Recognition

Volume 67, July 2017, Pages 32-46
Pattern Recognition

Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification

https://doi.org/10.1016/j.patcog.2017.02.011Get rights and content

Highlights

  • This paper presents a weighted linear loss multiple birth support vector machine based on information granulation (GWLMBSVM).

  • GWLMBSVM divides the data into several granules and builds a set of classifiers in the mixed granules.

  • By introducing the weighted linear loss, the proposed GWLMBSVM only needs to solve simple linear equations.

  • The overall computational complexity of GWLMBSVM is lower than multi-class WLTSVM classifier.

Abstract

Recently proposed weighted linear loss twin support vector machine (WLTSVM) is an efficient algorithm for binary classification. However, the performance of multiple WLTSVM classifier needs improvement since it uses the strategy ‘one-versus-rest’ with high computational complexity. This paper presents a weighted linear loss multiple birth support vector machine based on information granulation (WLMSVM) to enhance the performance of multiple WLTSVM. Inspired by granular computing, WLMSVM divides the data into several granules and builds a set of sub-classifiers in the mixed granules. By introducing the weighted linear loss, the proposed approach only needs to solve simple linear equations. Moreover, since WLMSVM uses the strategy “all-versus-one” which is the key idea of multiple birth support vector machine, the overall computational complexity of WLMSVM is lower than that of multiple WLTSVM. The effectiveness of the proposed approach is demonstrated by experimental results on artificial datasets and benchmark datasets.

Introduction

Standard support vector machine (SVM) [1], which is based on the statistical learning theory [2] and the Vapnik–Chervonenkis (VC) dimension, classifies 2-category points by assigning them to one of two disjoint half spaces. SVM has drawn extensive attention of scholars [3], [4], [5], [6], [7], [8] and has been applied to many fields successfully [9], [10], [11], [12], [13]. Twin support vector machine (TWSVM) [14], as an excellent extension of SVM, generates two nonparallel hyperplanes such that each plane is close to one of the two classes and as far as possible from the other class. TWSVM assigns a new sample to one of the classes depending on which hyperplane the new sample is closer to. An illustrative diagram of the thought of TWSVM in 2-dimensional space is shown in Fig. 1. TWSVM solves two small-scale quadratic programming problems (QPPs), whereas SVM solves one single QPP with a large number of constraints. Because of the strategy, TWSVM is almost four times faster than standard SVM [15]. In the last several years, TWSVM has been studied extensively and greatly generalized [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27]. Recently, Shao et al. [28] proposed a novel extension of TWSVM, called weighted linear loss twin support vector machine (WLTSVM). Different from TWSVM, WLTSVM solves linear equations. The two systems of linear equations in WLTSVM for binary classification can be solved efficiently by using the well-known conjugate gradient algorithm, resulting in the ability to deal with large-scale datasets without any extra external optimizers. Many pattern recognition problems in real world are multi-class classification problems [29], [30], [31], [32], [33], [34], [35], [36], [37], [38]. WLSTVM has also been extended to multi-class classification problems. However, multiple WLTSVM uses the strategy “one-versus-rest” with high computational complexity. Multiple WLTSVM builds a binary WLTSVM classifier for each class. Each binary WLTSVM in multiple WLTSVM is constructed by considering samples in one of the classes as positive samples and the rest as negative samples and training them. Multiple WLTSVM does not keep the advantages of WLTSVM that has high performance and low computational complexity. Multiple birth support vector machine (MBSVM) [39] is another novel extension of TWSVM with high performance. MBSVM uses the strategy “all-versus-one”. The strategy “all-versus-one” considers one of the classes as negative class and all the rest classes as positive class in turn to generate a serious of binary sub-classifiers to solve the multi-class classification problem. However, MBSVM needs to deal with a serious of QPPs.

Several multi-class TWSVMs have been proposed. The strategies that can be used to extend binary TWSVMs to multi-class TWSVMs include: one-versus-rest, one-versus-one, one-versus-one-versus-rest, binary tree, rest-versus-one, directed acyclic grape (DAG). The strategy one-versus-rest is easy to understand and implement. However, the complexity of one-versus-rest based methods is high. In general, one-versus-one based multi-class TWSVMs and DAG based multi-class TWSVMs always get better classification accuracies than other methods. However, they need to build a large number of sub-classifiers. When the number of classes is big, they are complex systems. The complexity of one-versus-one-rest based methods is higher than one-versus-rest based methods. The binary TWSVMs in rest-versus-one based methods take one class as the negative class and the rest classes as the positive class, so the numbers of constrains are small. The complexity related to the number of constrains directly. Compared with other approaches, the advantage of rest-versus-one based methods is the lower complexity. In this paper, we employ the rest versus one strategy to reduce the time complexity.

Granular computing [40], [41], [42], covering all the research about theories, methods, techniques and tools of granulation, is a powerful method to handle large scale information. The essence of granular computing is to find an approximate solution, which is simple and low-cost, to replace the exact solution through using inaccurate and large scale information to achieve the tractability, robustness, low cost and better describing the real world of intelligent systems or intelligent control. The combination of granular computing with statistical learning theory is becoming a hotspot. Many effective granular SVM (GSVM) models for binary classification have been developed [41]. Wang et al. [44] proposed a GSVM model based on mixed measure; Ding et al. [45] proposed a fast fuzzy support vector machine based on information granulation; Cheng et al. [46] proposed a dynamic GSVM. However, combination of granular computing with extensions of TWSVM for multi-class classification is still an unsolved research problem.

This paper proposes a new classifier for multi-class classification, called weighted linear loss multiple birth support vector machine based on information granulation (WLMSVM) to enhance the performance of multiple WLTSVM. The proposed algorithm works as follows. Firstly, it splits the whole feature space into a set of information granules depending on the training data and signs the information granules into “pure granule” or “mixed granule” depending on the labels of training samples in them. A pure granule is a subspace in which there are only samples with same class label. A mixed granule is a subspace in which two or more classes present. Then, WLMSVM builds one multi-class sub-classifier in each mixed granule. Different from multiple WLTSVM, our approach uses the strategy “all-versus-one” which is the key idea of MBSVM. In a given mixed granule, the sub-classifier of WLMSVM generates one hyperplane for each class by solving a TWSVM-style QPP which considers the samples in one class as negative samples and all other samples in the mixed granule as positive samples. The last step is to predict the label of an unlabeled sample. Compared with other methods for multi-class classification, the proposed approach has three advantages. 1) By introducing the strategy “all-versus-one”, our approach as a whole has low computational complexity. Especially when the number of classes is large, WLMSVM can always work faster than most of the other methods; 2) WLMSVM keeps the advantage of the multiple WLTSVM classifier. WLMSVM uses a weighted linear loss instead of hinge loss. The use of weighted linear loss leads to that WLMSVM only needs to solve several systems of linear equations; 3) Granular computing technique frees each sub-classifier to focus on the local information of data in the granules.

The rest of this article is organized as follows. In the next section, a brief review to TWSVM, WLTSVM, MBSVM and granular computing is provided. In Section 3, we introduce the proposed weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification in detail. Experimental results are given in Section 4. In the last section, concluding remarks and further research to be developed are presented.

Section snippets

Twin support vector machine

Assume a binary classification problem with m samples in the n-dimensional real space Rn. The set of training data points is represented by T = {(xi, yi) | i = 1,2,3,…,m}, where xi is input sample and yi ϵ {+1,−1} is corresponding output. Let m1 × n matrix A denote the samples belonging to class +1 and m2 × n matrix B denote the samples belonging to class −1. Each row of A is a sample belonging to class +1, and each row of B is a sample belonging to class −1.

For a linear binary classification

WLMSVM

In this section, we present a weighted linear loss multiple birth support vector machine based on information granulation. The approach can be divided into three steps: The first step is to split the feature space and build suitable information granules. The second step is to train sub-classifiers in mixed granules and combine them into WLMSVM as a whole. The last step is to predict a new sample. The specific flow of WLMSVM is shown in Fig. 2.

Experiments and analysis

In order to test the performance of the proposed algorithm, we do a series of tests on three artificial datasets and several popular UCI datasets. All experiments are implemented on a personal computer with Intel (R) 3.4 GHz Intel Core i5 CPU, 4 G memory and MATLAB 2012a environment. If no other special instructions, in this article, we use 10-fold cross-validation methods and select the average accuracy to measure the classification accuracy. For the nonlinear case, these problems are tested

Conclusions

In order to further improve the performance of Multiple WLTSVM and reduce the complexity of the method as a whole, this paper propose weighted linear loss multiple birth support vector machine based on information granulation. The proposed approach for multi-class classification is based on weighted linear loss and the strategy “all-versus-one”. WLMSVM first splits the feature space into several information granules and then trains a sub-classifier with weighted linear loss in each mixed

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos. 61379101, 61672522), the National Key Basic Research Program of China (No. 2013CB329502), the Priority Academic Program Development of Jiangsu Higer Education Institutions(PAPD), and the Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology(CICAEET).

Shifei Ding, born in Qingdao, received his Ph.D. degree from Shandong University of Science and Technology in 2004. He received postdoctoral degree from Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, and Chinese Academy of Sciences in 2006. He is a professor and Ph.D. supervisor at China University of Mining and Technology as His research interests include intelligent information processing, pattern recognition, machine learning, data mining, and

References (55)

  • X. Peng

    TPMSVM: a novel twin parametric-margin support vector machine for pattern recognition

    Pattern Recognit.

    (2011)
  • X. Hua et al.

    Weighted least squares projection twin support vector machines with local information

    Neurocomputing

    (2015)
  • Y.H. Shao et al.

    Weighted linear loss twin support vector machine for large-scale classification

    Knowl.-Based Syst.

    (2015)
  • J.A. Nasiri et al.

    Least squares twin multi-class classification support vector machine

    Pattern Recognit.

    (2015)
  • X. Ju et al.

    Nonparallel hyperplanes support vector machine for multi-class classification

    Procedia Comput. Sci.

    (2015)
  • M. Chu et al.

    Multi-class classification methods of enhanced LS-TWSVM for strip steel surface defects

    J. Iron Steel Res. Int.

    (2014)
  • Y.H. Shao et al.

    The best separating decision tree twin support vector machine for multi-class classification

    Procedia Comput. Sci.

    (2013)
  • Y. Tang et al.

    Granular support vector machines with association rules mining for protein homology prediction

    Artif. Intell. Med.

    (2005)
  • F. Fernández-Navarro et al.

    A dynamic over-sampling procedure based on sensitivity for multi-class problems

    Pattern Recognit.

    (2011)
  • Y. Sun et al.

    Cost-sensitive boosting for classification of imbalanced data

    Pattern Recognit.

    (2007)
  • C. Huang et al.

    A GA-based feature selection and parameters optimization for support vector machines

    Expert Syst. Appl.

    (2006)
  • WangZ. et al.

    A GA-based model selection for smooth twin parametric-margin support vector machine

    Pattern Recognit.

    (2013)
  • C. Cortes et al.

    Support vector networks

    Mach. Learn.

    (1995)
  • V.N. Vapnik

    The Nature of Statistical Learning Theory

    (1998)
  • J. Suykens et al.

    Least square support vector machine classifiers

    Neural Process. Lett.

    (1999)
  • Y. Feng et al.

    Normalization of Linear Support Vector Machines

    IEEE Trans. Signal Process.

    (2015)
  • B Gu et al.

    Incremental support vector learning for ordinal regression

    IEEE Trans. Neural Netw. Learn. Syst.

    (2015)
  • Cited by (78)

    • TSVM-M<sup>3</sup>: Twin support vector machine based on multi-order moment matching for large-scale multi-class classification

      2022, Applied Soft Computing
      Citation Excerpt :

      An effective way is to substitute hinge loss in the objective function of SVM and TSVM-based methods with the quadratic loss [34,35] or the weighted linear loss [36] so that the final decision hyperplane can be obtained by only solving several systems of linear equations [32,34,36]. Some methods also consider dividing the training process into several steps, each step trains part of the samples, thus reducing the computation complexity [37,38]. One disadvantage of these methods is that they do not fully use all the training data during each step.

    View all citing articles on Scopus

    Shifei Ding, born in Qingdao, received his Ph.D. degree from Shandong University of Science and Technology in 2004. He received postdoctoral degree from Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, and Chinese Academy of Sciences in 2006. He is a professor and Ph.D. supervisor at China University of Mining and Technology as His research interests include intelligent information processing, pattern recognition, machine learning, data mining, and granular computing etc. He has published 5 books, and more than 150 research papers in journals and international conferences.

    Xiekai Zhang, received his B.Sc. degree in computer science in China University of Mining and Technology in 2013, and is currently pursuing the M.Sc. degree in the School of Computer Science and Technology, China University of Mining and Technology. His research interest includes machine learning, pattern recognition, support vector machine, and various applications.

    Xuanyue An, received her B.Sc. degree in computer science in Jiangsu Normal University in 2015, and is currently pursuing the M.Sc. degree in the School of Computer Science and Technology, China University of Mining and Technology. His research interest includes machine learning, pattern recognition, support vecter machine, and various applications.

    Yu Xue, received his B.Sc. degree in computer science in China University of Mining and Technology in 2013, and is currently pursuing the Ph.D. degree in the School of Computer Science and Technology, China University of Mining and Technology. His research interest includes machine learning, pattern recognition, support vector machine, and various applications.

    View full text