Elsevier

Neurocomputing

Volume 99, 1 January 2013, Pages 134-143
Neurocomputing

Bi-density twin support vector machines for pattern recognition

https://doi.org/10.1016/j.neucom.2012.06.012Get rights and content

Abstract

In this paper we present a classifier called bi-density twin support vector machines (BDTWSVMs) for data classification. In the training stage, BDTWSVMs first compute the relative density degrees for all training points using the intra-class graph whose weights are determined by a local scaling heuristic strategy, then optimize a pair of nonparallel hyperplanes through two smaller sized support vector machine (SVM)-typed problems. In the prediction stage, BDTWSVMs assign to the class label depending on the kernel density degree-based distances from each test point to the two hyperplanes. BDTWSVMs not only inherit good properties from twin support vector machines (TWSVMs) but also give good description for data points. The experimental results on toy as well as publicly available datasets indicate that BDTWSVMs compare favorably with classical SVMs and TWSVMs in terms of generalization.

Introduction

Support vector machines (SVMs) have attracted substantial interest in the community of machine learning and pattern recognition after the introduction [1], [2], [3]. As the state-of-the-art classifiers, SVMs possess many striking advantages. First, SVMs directly implement the structural risk minimization (SRM) principle, in which the capacity of a learning machine can be controlled so as to minimize a bound on the generalization error. In other words, SVMs try to find an optimal separating hyperplane with the maximum margin. Second, SVMs solve a quadratic programming problem (QPP), assuring that once a solution has been reached, it is the unique (global) solution. Third, SVMs derive a sparse and robust solution by maximizing the margin between two classes of points. Fourth, it has intuitively geometric interpretations for classification tasks. Furthermore, SVMs can be easily extended to deal with nonlinear problems using kernel tricks. Recently, SVMs have been successfully applied in many fields [4], [5].

One of the main challenges in classical SVMs is the large computational complexity. Recently, Jayadeva et al. [6] have proposed the twin support vector machines (TWSVMs) for binary data classification. TWSVMs aimed at generating two nonparallel hyperplanes such that each one is closer to one class and is at least one far from the other class for any given binary data set. The strategy of solving a pair of smaller sized QPPs instead of a large one as in classical SVMs makes the learning speed of TWSVMs be approximately four times faster than classical SVMs. In terms of generalization, TWSVMs favorably compare with classical SVMs. Some extensions to TWSVMs include the least squares TWSVMs (LS-TWSVMs) [7], smooth TWSVMs [8], nonparallel-plane proximal classifiers (NPPCs) [9], [10], geometric algorithms [11], localized TWSVMs (LCTSVMs) [12], twin support vector regressions (TSVRs) [13], [14], twin parametric-margin SVMs (TPMSVMs) [15], and twin parametric insensitive support vector regressions (TPISVRs) [16].

The experimental results in [6] have shown that TWSVMs compare favorably with SVMs and GEPSVMs in terms of generalization performance. However, each hyperplane of TWSVMs, especially for the nonlinear case, may pass through as possible as many data points in the corresponding class. In a nutshell, the nonparallel hyperplanes of TWSVMs cannot effectively describe the data information. In addition, TWSVMs determine the class label of a test point according to its distances to the two hyperplanes, which also does not consider the information of two classes of points. For instance, if the two classes of points have different class scatters, TWSVMs may obtain a poor generalization performance. Fig. 1 shows a toy example for this problem. In this toy example, not only have two classes of data points totally different scatters, the scatter of one class is also heteroscedastic, that is, the scatter strongly depends on the input value. Fig. 1 (left) shows the possible learning result of linear TWSVM. Intuitively, the result given in Fig. 1 (right) is a better choice since it successfully integrates the scatter information into the classification hyperplane.

In this paper, we present a novel classifier called the bi-density twin support vector machines (BDTWSVMs) for binary classification. In the training stage of BDTWSVMs, we extract the relative density degree for each data point according to the weights of the intra-class graph of training set and treat it as the relative weight (or margin) of this point in the optimization problems, then construct a pair of nonparallel hyperplanes through two small sized QPPs. While in the prediction stage of BDTWSVMs, we introduce a kernel density estimation (KDE) method [17] to calculate the relative density degree-based of the distances from test data points to the pair of nonparallel hyperplanes. BDTWSVMs successfully inherit the merit of the TWSVMs, and they have a special case of TWSVMs when the relative density degree of each point is degenerated to one. More importantly, BDTWSVMs can effectively describe the characteristics of data points. First, BDTWSVMs use the weights of the intra-class graph of training set to describe the relative density degrees in the training stage. These weights are calculated by the local scaling heuristic method [18] and can effectively reflect the local geometry manifold of the sample. Second, BDTWSVMs adopt the kernel density degree-based distances to classify test points in the test stage, which can effectively describe the scatters of two classes of data points. Computational comparisons on some classical pattern recognition methods in terms of generalization performance have been made on several artificial and benchmark datasets, indicating that BDTWSVMs show comparable generalization.

The rest of this paper is organized as follows. Section 2 reviews related work on TWSVMs and kernel density estimation, and introduces the intra-class graph for extracting relative density degrees. Section 3 introduces the BDTWSVMs, including the linear and nonlinear cases, respectively. Section 4 discusses the connection between BDTWSVMs and other methods. Section 5 deals with the experimental results and Section 6 concludes this paper.

Section snippets

Related work

Let the binary samples to be classified be denoted by a set of n column vectors xi,i=1,2,,n in the m-dimensional real space Rm, and yi{1,2} denotes the class to which the ith sample belongs. Without loss of generality, we assume that the matrix X with size of m×n represents the all training data points, and assume the matrices Xk, k=1,2 with sizes of m×nk, where n=n1+n2. Further, we use I to represent the index set of X, and use Ik, k=1,2 to represent the indices of the two classes of

Linear BDTWSVM

Generally, the linear BDTWSVM also finds two nonparallel hyperplanes (1), which are optimized by the following two QPPs:minν12(w12+b12)+12iI1ρi(w1Txi+b1)2+c1jI2ξjs.t.w1Txj+b1ρj+ξj,ξj0,jI2,minν22(w22+b22)+12jI2ρj(w2Txj+b2)2+c2iI1ηis.t.w2Txi+b2ρiηi,ηi0,iI1,where ρk is the relative density value of the point xk obtained by the above relative density estimation method (see Eq. (21)), c1, c2, ν1 and ν2 are the positive penalty factors given by users.

Before optimizing the primal

Connection to other methods

In this section, we discuss the relationship between BDTWSVMs and some other methods.

Experiments

In this section, we give the evaluation results of BDTWSVMs in comparison with SVMs, TWSVMs, and LCTSVMs4 on both synthetic toy data sets and real-world benchmark data sets. Note that, for the nonlinear case, only the Gaussian kernel is employed in simulations. In addition, we consider two types of TWSVMs: the first type of TWSVMs (denoted as TWSVM-As) is those in [6], in which the regularization terms

Conclusions

In this paper, we have proposed the BDTWSVMs for binary classification. The proposed BDTWSVMs inherit good properties from TWSVMs. For instance, BDTWSVMs only need optimize two small sized SVM-typed QPPs. The difference between BDTWSVMs and TWSVMs is that the relative density degrees for all training points and the distances from the test points to the pair of hyperplanes are extracted by using the intra-class graph and kernel density estimation. The BDTWSVMs degenerate classical TWSVMs if

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. This work has been partly supported by the Innovative Project of Shanghai Municipal Education Commission (11YZ81), the foundation of SHNU (SK201204), and the Shanghai Leading Academic Discipline Project (S30405).

Xinjun Peng received the M.S. degree in mathematics from the Yunnan University, Kunming, in 2005, and the Ph.D. from the Shanghai University, Shanghai, China, in 2008. Currently he is a lecturer in the College of Information Science and Engineering, Guangxi University for Nationalities. His research interests are in the area of applied mathematics, neural networks, statistical learning theory, and support vector machines.

References (34)

  • M. Singh et al.

    Reduced twin support vector regression

    Neurocomputing

    (2011)
  • H. Xue et al.

    Discriminatively regularized least-squares classification

    Pattern Recognition

    (2009)
  • V. Christianini et al.

    An Introduction to Support Vector Machines

    (2002)
  • V.N. Vapnik

    The Natural of Statistical Learning Theory

    (1995)
  • V.N. Vapnik

    Statistical Learning Theory

    (1998)
  • E. Osuna, R. Freund, F. Girosi, Training support vector machines: an application to face detection, in: Proceedings of...
  • T. Joachims, C. Ndellec, C. Rouveriol, Text categorization with support vector machines: learning with many relevant...
  • Cited by (0)

    Xinjun Peng received the M.S. degree in mathematics from the Yunnan University, Kunming, in 2005, and the Ph.D. from the Shanghai University, Shanghai, China, in 2008. Currently he is a lecturer in the College of Information Science and Engineering, Guangxi University for Nationalities. His research interests are in the area of applied mathematics, neural networks, statistical learning theory, and support vector machines.

    Dong Xu received the M.S. degree in mathematics in 2002 and the Ph.D. in 2005 from Shanghai University, Shanghai, China, respectively. Currently he is a lecturer in the Department of Mathematics, Shanghai Normal University. His research interests include applied mathematics, neural networks, software design.

    View full text