Bi-density twin support vector machines for pattern recognition
Introduction
Support vector machines (SVMs) have attracted substantial interest in the community of machine learning and pattern recognition after the introduction [1], [2], [3]. As the state-of-the-art classifiers, SVMs possess many striking advantages. First, SVMs directly implement the structural risk minimization (SRM) principle, in which the capacity of a learning machine can be controlled so as to minimize a bound on the generalization error. In other words, SVMs try to find an optimal separating hyperplane with the maximum margin. Second, SVMs solve a quadratic programming problem (QPP), assuring that once a solution has been reached, it is the unique (global) solution. Third, SVMs derive a sparse and robust solution by maximizing the margin between two classes of points. Fourth, it has intuitively geometric interpretations for classification tasks. Furthermore, SVMs can be easily extended to deal with nonlinear problems using kernel tricks. Recently, SVMs have been successfully applied in many fields [4], [5].
One of the main challenges in classical SVMs is the large computational complexity. Recently, Jayadeva et al. [6] have proposed the twin support vector machines (TWSVMs) for binary data classification. TWSVMs aimed at generating two nonparallel hyperplanes such that each one is closer to one class and is at least one far from the other class for any given binary data set. The strategy of solving a pair of smaller sized QPPs instead of a large one as in classical SVMs makes the learning speed of TWSVMs be approximately four times faster than classical SVMs. In terms of generalization, TWSVMs favorably compare with classical SVMs. Some extensions to TWSVMs include the least squares TWSVMs (LS-TWSVMs) [7], smooth TWSVMs [8], nonparallel-plane proximal classifiers (NPPCs) [9], [10], geometric algorithms [11], localized TWSVMs (LCTSVMs) [12], twin support vector regressions (TSVRs) [13], [14], twin parametric-margin SVMs (TPMSVMs) [15], and twin parametric insensitive support vector regressions (TPISVRs) [16].
The experimental results in [6] have shown that TWSVMs compare favorably with SVMs and GEPSVMs in terms of generalization performance. However, each hyperplane of TWSVMs, especially for the nonlinear case, may pass through as possible as many data points in the corresponding class. In a nutshell, the nonparallel hyperplanes of TWSVMs cannot effectively describe the data information. In addition, TWSVMs determine the class label of a test point according to its distances to the two hyperplanes, which also does not consider the information of two classes of points. For instance, if the two classes of points have different class scatters, TWSVMs may obtain a poor generalization performance. Fig. 1 shows a toy example for this problem. In this toy example, not only have two classes of data points totally different scatters, the scatter of one class is also heteroscedastic, that is, the scatter strongly depends on the input value. Fig. 1 (left) shows the possible learning result of linear TWSVM. Intuitively, the result given in Fig. 1 (right) is a better choice since it successfully integrates the scatter information into the classification hyperplane.
In this paper, we present a novel classifier called the bi-density twin support vector machines (BDTWSVMs) for binary classification. In the training stage of BDTWSVMs, we extract the relative density degree for each data point according to the weights of the intra-class graph of training set and treat it as the relative weight (or margin) of this point in the optimization problems, then construct a pair of nonparallel hyperplanes through two small sized QPPs. While in the prediction stage of BDTWSVMs, we introduce a kernel density estimation (KDE) method [17] to calculate the relative density degree-based of the distances from test data points to the pair of nonparallel hyperplanes. BDTWSVMs successfully inherit the merit of the TWSVMs, and they have a special case of TWSVMs when the relative density degree of each point is degenerated to one. More importantly, BDTWSVMs can effectively describe the characteristics of data points. First, BDTWSVMs use the weights of the intra-class graph of training set to describe the relative density degrees in the training stage. These weights are calculated by the local scaling heuristic method [18] and can effectively reflect the local geometry manifold of the sample. Second, BDTWSVMs adopt the kernel density degree-based distances to classify test points in the test stage, which can effectively describe the scatters of two classes of data points. Computational comparisons on some classical pattern recognition methods in terms of generalization performance have been made on several artificial and benchmark datasets, indicating that BDTWSVMs show comparable generalization.
The rest of this paper is organized as follows. Section 2 reviews related work on TWSVMs and kernel density estimation, and introduces the intra-class graph for extracting relative density degrees. Section 3 introduces the BDTWSVMs, including the linear and nonlinear cases, respectively. Section 4 discusses the connection between BDTWSVMs and other methods. Section 5 deals with the experimental results and Section 6 concludes this paper.
Section snippets
Related work
Let the binary samples to be classified be denoted by a set of n column vectors in the m-dimensional real space , and denotes the class to which the ith sample belongs. Without loss of generality, we assume that the matrix X with size of m×n represents the all training data points, and assume the matrices , with sizes of , where . Further, we use to represent the index set of X, and use , to represent the indices of the two classes of
Linear BDTWSVM
Generally, the linear BDTWSVM also finds two nonparallel hyperplanes (1), which are optimized by the following two QPPs:where is the relative density value of the point xk obtained by the above relative density estimation method (see Eq. (21)), c1, c2, and are the positive penalty factors given by users.
Before optimizing the primal
Connection to other methods
In this section, we discuss the relationship between BDTWSVMs and some other methods.
Experiments
In this section, we give the evaluation results of BDTWSVMs in comparison with SVMs, TWSVMs, and LCTSVMs4 on both synthetic toy data sets and real-world benchmark data sets. Note that, for the nonlinear case, only the Gaussian kernel is employed in simulations. In addition, we consider two types of TWSVMs: the first type of TWSVMs (denoted as TWSVM-As) is those in [6], in which the regularization terms
Conclusions
In this paper, we have proposed the BDTWSVMs for binary classification. The proposed BDTWSVMs inherit good properties from TWSVMs. For instance, BDTWSVMs only need optimize two small sized SVM-typed QPPs. The difference between BDTWSVMs and TWSVMs is that the relative density degrees for all training points and the distances from the test points to the pair of hyperplanes are extracted by using the intra-class graph and kernel density estimation. The BDTWSVMs degenerate classical TWSVMs if
Acknowledgments
The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. This work has been partly supported by the Innovative Project of Shanghai Municipal Education Commission (11YZ81), the foundation of SHNU (SK201204), and the Shanghai Leading Academic Discipline Project (S30405).
Xinjun Peng received the M.S. degree in mathematics from the Yunnan University, Kunming, in 2005, and the Ph.D. from the Shanghai University, Shanghai, China, in 2008. Currently he is a lecturer in the College of Information Science and Engineering, Guangxi University for Nationalities. His research interests are in the area of applied mathematics, neural networks, statistical learning theory, and support vector machines.
References (34)
- et al.
Application of smoothing technique on twin support vector machines
Pattern Recognition Lett.
(2008) - et al.
Newton's method for nonparallel plane proximal classifier with unity norm hyperplanes
Signal Process.
(2010) - et al.
Nonparallel plane proximal classifier
Signal Process.
(2009) A support vector machine classifier and its geometric approaches
Inf. Sci.
(2010)- et al.
Localized twin SVM via convex minimization
Neurocomputing
(2011) TSVR: an efficient twin support vector machine for regression
Neural Netw.
(2010)Primal twin support vector regression and its sparse approximation
Neurocomputing
(2010)TPMSVM: a novel twin parametric-margin support vector machine for pattern recognition
Pattern Recognition
(2011)Efficient twin parametric insensitive support vector regression model
Neurocomputing
(2012)Building sparse twin support vector machine classifiers in primal space
Inf. Sci.
(2011)
Reduced twin support vector regression
Neurocomputing
Discriminatively regularized least-squares classification
Pattern Recognition
An Introduction to Support Vector Machines
The Natural of Statistical Learning Theory
Statistical Learning Theory
Cited by (0)
Xinjun Peng received the M.S. degree in mathematics from the Yunnan University, Kunming, in 2005, and the Ph.D. from the Shanghai University, Shanghai, China, in 2008. Currently he is a lecturer in the College of Information Science and Engineering, Guangxi University for Nationalities. His research interests are in the area of applied mathematics, neural networks, statistical learning theory, and support vector machines.
Dong Xu received the M.S. degree in mathematics in 2002 and the Ph.D. in 2005 from Shanghai University, Shanghai, China, respectively. Currently he is a lecturer in the Department of Mathematics, Shanghai Normal University. His research interests include applied mathematics, neural networks, software design.