Elsevier

Neurocomputing

Volume 144, 20 November 2014, Pages 174-183
Neurocomputing

Feature selection for least squares projection twin support vector machine

https://doi.org/10.1016/j.neucom.2014.05.040Get rights and content

Abstract

In this paper, we propose a new feature selection approach for the recently proposed Least Squares Projection Twin Support Vector Machine (LSPTSVM) for binary classification. 1-norm is used in our feature selection objective so that only non-zero elements in weight vectors will be chosen as selected features. Also, the Tikhonov regularization term is incorporated to the objective of our approach to reduce the singularity problems of Quadratic Programming Problems (QPPs), and then to minimize its 1-norm measure. This approach leads to a strong feature suppression capability, called as Feature Selection for Least Squares Projection Twin Support Vector Machine (FLSPTSVM). The solutions of FLSPTSVM can be obtained by solving two smaller QPPS arising from two primal QPPs as opposed to two dual ones in Twin Support Vector Machine (TWSVM). Thus, FLSPTSVM is capable of generating sparse solutions. This means that FLSPTSVM can reduce the number of input features for a linear case. Our linear FLSPTSVM can also be extended to a nonlinear case with the kernel trick. When a nonlinear classifier is used, the number of kernel functions required for the classifier is reduced. Our experiments on publicly available datasets demonstrate that our FLSPTSVM has comparable classification accuracy to that of LSPTSVM and obtains sparse solutions.

Introduction

Support Vector Machines (SVM) [1], [2] is a very useful machine learning method, which is developed on the basis of the statistical learning theory and structural risk minimization [3], [4]. Compared with other machine learning approaches such as Artificial Neutral Networks (ANNS), SVM implements the structural risk minimization rather than the empirical risk minimization principle, thus SVM minimizes the upper bound of the generalization error. As a powerful tool for supervised learning, SVM can handle small samples, nonlinear, dimension disaster, over learning and local minimum problems [5]. It has been successfully applied to a variety of real-world problems such as face recognition [6], [7], [8], text categorization [9], bioinformatics [10], [11], [12], time series prediction [13], and regression estimation [14], [15].

However, one of the main challenges for the classic SVM is the high computational complexity, and its computational complexity is at most O (N3), where N is the number of training samples [16]. This drawback restricts the application of SVM to large-scale problems. In the last few years, a series of modified SVM algorithms have been developed to improve computational efficiency. Proximal Support Vector Machines (PSVM) [17] solve a set of linear equations rather than convex optimization problems, and its time complexity is O (n3), where n is the dimensions of the samples. In essence, PSVM classifies the samples by two parallel hyperplanes on the premise of guaranteeing the maximum margin. Generalized Proximal SVM (GEPSVM) [18] is an extension of PSVM, which aims at generating two nonparallel hyperplanes so that each hyperplane is closer to its class and is as far as possible from the other classes. Subsequently, Jayadeva et al. [19] proposed the Twin Support Vector Machines (TWSVM). This algorithm seeks two nonparallel planes by solving two related SVM-type problems, each of which is smaller than a classic SVM. Compared with a classic SVM, the major advantage of TWSVM is that the speed of TWSVM is 4 times faster than that of the classic SVM. In order to further reduce the computational cost of TWSVM, the Least Squares version of TWSVM is proposed (LSTSVM) [20]. LSTSVM possesses extremely faster training speed since their separating hyperplanes are determined by solving a set of linear equations. Essentially, it solves two related PSVM type of problems. Shao et al. [21] proposed a modified TWSVM to improve the performance of classification, named the Twin Bounded Support Vector Machines (TBSVM). Ye et al. [22] proposed a multi-plane learning approach called Localized Twin SVM via Convex Minimization (LCTSVM). LIPSVM uses the selectively generated points to train the classifier. The major advantage of LCTSVM is that it is resistant to outliers.

Different from TWSVM that improves GEPSVM by using SVM-type formulation, the Multi-weight Vector Projection Support Vector Machines (MVSVM) [23] were proposed to enhance the performance of GEPSVM by seeking one weight vector instead of a hyper-plane for each class so that the samples of one class are closest to its class mean while the samples of different classes are separated as far as possible. The weight vectors of MVSVM can be found by solving a pair of eigenvalue problems.

Further, inspired by MVSVM and TWSVM, Chen et al. [16] proposed the Projection Twin Support Vector Machines (PTSVM). PTSVM solves two related SVM-type problems in order to obtain the two projection directions whereas MVSVM needs to solve two generalized eigenvalue problems. Both PTSVM and TWSVM are implemented by solving two smaller QPPs. Also, by using the proposed recursive algorithm, PTSVM can generate multiple projection directions. Experimental results in [16] show that PTSVM has comparable or better performance comparing with the other four state-of-the-art Multiple-surface Classification (MSC) algorithms (i.e. GEPSVM, TWSVM, LSTSVM and MVSVM). In order to further enhance the performance of PTSVM, Shao et al. [24] proposed a least squares version of PTSVM, called Least Squares Projection Twin Support Vector Machine (LSPTSVM). LSPTSVM solves two modified primal problems by solving two sets of linear equations whereas PTSVM needs to solve two QPPs along with two sets of linear equations.

In the past few years, the research in automatic feature selection optimization has attracted more and more attention. In particle for online applications, feature selection is a critical step for data classification, especially when the classification problems have high dimensional spaces. It is well known that the 1-norm SVM has the advantages over the 2-norm SVM since 1-norm SVM can generate sparse solutions and it thus makes its classifier easier to store and faster to compute and suppress features [25], [26]. In such way, only non-zero elements in weight vectors will be chosen as selected features. Therefore, the 1-norm SVM can automatically select relevant features by estimating their coefficients. Especially, when there are many noise variables in input features, the 1-norm SVM has significant advantages over the 2-norm SVM due to that the latter does not select significant variables [27]. A feature selection method for nonparallel plane support vector machine classification (FLSTSVM) [28] is proposed for strong feature suppression, in which a Tikhonov regularization term is incorporated to the objective of LSTSVM so that FLSTSVM can minimizes its 1-norm measure. The solution of FLSTSVM can be obtained by solving two smaller QPPs arising from two primal QPPs so that there is no need to solve two dual ones and thus results in sparse solutions. This is different from that in TWSVM.

In this paper, we propose a feature selection algorithm for outputting sparse solutions to LSPTSVM. The 2-norm regularization terms are replaced with the 1-norm ones in the primal problems of LSPTSVM, thereby helping suppress the input features of LSPTSVM. FLSPTSVM obtains two planes by solving two primal rather than dual QPPs, each of which is smaller than that of PTSVM and comparable to that of LSPTSVM. This paper is organized as follows: Section 2 briefly discusses TWSVM, MVSVM, PTSVM, and LSPTSVM. Section 3 proposes our FLSPTSVM and experimental results are described in Section 4. Finally, concluding remarks are given in Section 5.

Section snippets

Related work

In this section, we give a brief description of TWSVM, MVSVM, PTSVM and LSPTSVM.

We consider a binary classification problem in the n-dimensional real space Rn. The set of training data points is represented by X={(xj(i))|i=1,2,j=1,2,,mi}, where xj(i)Rn, thejth input belongs to class i and m=m1+m2, yj{+1,1} are corresponding outputs. We further organize the m1 inputs of Class+1 by matrix A in Rm1×n and the m2 inputs of Class−1 by matrix B in Rm2×n. The 2-norm of x is denoted by x, and

Linear FLSPTSVM

Different from LSPTSVM, the objective functions (22), (23) in LSPTSVM are modified as follows:(FLSPTSVM1)minw112w1TS1w1+εw11+C12e2Tξ2s.t.Bw11m1e2e1TAw1+ξ=e2and(FLSPTSVM2)minw212w2TS2w2+εw21+C22e1Tη2s.t.(Aw21m2e1e2TBw2)+η=e1

Note that formulations (28), (29) are slightly different from both (22), (23). FLSPTSVM aims to suppress the input features. In FLSPTSVM, the squares of 2-norm regularization terms w12 and w22 are replaced by 1-norm ones (w11 andw21), respectively. This can

Experimental results

In this section, in order to demonstrate the performance of our approach, we report results on publicly available data sets: the UCI Repository [35], and two artificial datasets “Cross Plane” and “Complex Xor” [18], [23], as well as David Musicant׳s NDC Data Generator [36] datasets.

We focus on the comparisons between the proposed algorithms and some state-of-the-art multiple-surface classification methods, including GEPSVM [18], TWSVM [19], LSTSVM [21], MVSVM [23], LSPTSVM [24] and FLSTSVM [28]

Conclusion

We have improved LSPTSVM and derived a new algorithm for binary classification, which is termed as FLSPTSVM (Feature Selection LSPTSVM) in this paper. Our FLSPTSVM is an effective algorithm to find two projection directions by solving two smaller QPPs. Similar to the other multi-plane classifiers, FLSPTSVM is capable of dealing with XOR examples. Experimental results on the synthetic datasets, UCI datasets and NDC datasets demonstrate that our FLSPTSVM obtains classification accuracy comparable

Acknowledgment

The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. This work was partially supported by the Jiangsu Key Laboratory of Image and Video Understanding for Social Safety (Nanjing University of Science and Technology), Grant no. 30920130122006, the China Postdoctoral Science Foundation (Grant no. 2014M551599), and the National Natural Science Foundation of China (Grant nos. 61272220, 61101197) and the Natural Science Foundation of Jiangsu

Jianhui Guo received his B.Sc., M.Sc. and Ph.D. from the Nanjing University of Science and Technology, Nanjing, China, in 2003, 2005 and 2008, respectively. In 2008, he joined the Nanjing Institute of Electronics Technology as a senior engineer. Currently, he is an Associate Professor in the School of Computer Science and Engineering, the Nanjing University of Science and Technology. His research interests include machine learning, data mining, pattern recognition, robotics and information

References (36)

  • Y. Shao et al.

    Least squares recursive projection twin support vector machine for classification

    Pattern Recognition

    (2012)
  • W.D. Zhou et al.

    Linear programming support vector machines

    Pattern Recognit.

    (2002)
  • Y. Tao et al.

    Quotient vs difference: comparison between the two discriminant criteria

    Neurocomputing

    (2010)
  • X.i. Peng et al.

    Bi-density twin support vector machines for pattern recognition

    Neurocomputing

    (2013)
  • C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Min. Knowl. Discov.

    (1998)
  • N. Cristianini et al.

    An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods

    (2000)
  • V.N. Vapnik

    Statistical Learning Theory

    (1998)
  • V.N. Vapnik

    The Nature of Statistical Learning Theory

    (1995)
  • Cited by (34)

    • An attention based dual learning approach for video captioning

      2022, Applied Soft Computing
      Citation Excerpt :

      For future work, we intend to develop more appropriate attention mechanisms and better video captioning approaches for video caption generation. Based on our previous research [59–62], we will develop new deep neural networks. We will also explore the use of other information such as audio and semantic information for video caption generation.

    • Dilated Adversarial U-Net Network for automatic gross tumor volume segmentation of nasopharyngeal carcinoma

      2021, Applied Soft Computing
      Citation Excerpt :

      Furthermore, adopted the matrix factorization method in the literature [41], our input samples can be decomposed into multiple low-rank matrices, NPC features can be extracted more accurately, and the models can be built more effectively on these datasets. And the feature selection method [42] can be used to optimize the sigmoid activation function of the discriminative network and sparse the input feature matrix to obtain higher classification accuracy. Although the use of an adversarial network proposed in this paper to solve the problem of uneven sample distribution, transfer learning can also extract high-level abstract features from the study of large numbers of traditional image samples, and transfer them to the scarce labeled medical image sample training.

    • Multi-cue based four-stream 3D ResNets for video-based action recognition

      2021, Information Sciences
      Citation Excerpt :

      Deep convolutional neural networks (CNNs) have been widely used in image processing, computer vision, natural language processing and other fields [25,13,24,34,43], e.g. image recognition [45], object detection and face recognition [46]. The main reason for this success is that deep CNNs can learn deep hierarchical visual feature representation layer by layer, which is different from conventional shallow methods [9,39,44]. For action recognition, many deep learning based methods were developed.

    • An improved multiple birth support vector machine for pattern classification

      2017, Neurocomputing
      Citation Excerpt :

      Different from traditional SVM, TWSVM generates two nonparallel hyperplanes by solving two quadratic programming problems (QPPs) in small size such that each plane is close to one of the two classes and is as far as possible from the other class [14]. Currently, the research of TWSVM has also made a great progress [15–28]. One of the most important extensions is multiple birth support vector machine (MBSVM) for multi-class classification [29].

    View all citing articles on Scopus

    Jianhui Guo received his B.Sc., M.Sc. and Ph.D. from the Nanjing University of Science and Technology, Nanjing, China, in 2003, 2005 and 2008, respectively. In 2008, he joined the Nanjing Institute of Electronics Technology as a senior engineer. Currently, he is an Associate Professor in the School of Computer Science and Engineering, the Nanjing University of Science and Technology. His research interests include machine learning, data mining, pattern recognition, robotics and information fusion. In these areas, he has published over 10 journal and conference papers. (Email: [email protected])

    Ping Yi received her B.Sc. and M.Sc. both from the Nanjing University of Science and Technology, Nanjing, China, in 2003 and 2006, respectively. Currently, she is a Ph.D. student in the School of Instrument Science and Engineering, Southeast University, Nanjing, China. Her research interests include intelligent robot, machine learning and pattern recognition.

    Ruili Wang received his Ph.D. in Computer Science from Dublin City University, Ireland. Currently he is a Senior Lecturer in Computer Science and Information Technology. His research interests include intelligent systems and speech processing. He has been awarded one of the most prestigious research grants in New Zealand, Marsden Fund. He is an associate editor and member of editorial boards of 5 international journals.

    Qiaolin Ye received his B.Sc. in Computer Science from the Nanjing Institute of Technology, China, M.Sc. in Computer Science and Technology from the Nanjing Forestry University, China. Also, he received his Ph.D. from the Nanjing University of Science and Technology, China, in 2013. He is now an Associate Professor in Computer Science and Technology at the Nanjing Forestry University. His research interests include machine learning, data mining, pattern recognition and robotics.

    Chunxia Zhao received her Ph.D. in Electronic Engineering from the Harbin Institute of Technology in 1998. Since 2000, as a full Professor, she has been with the Computer Science and Technology Department at the Nanjing University of Science and Technology, Nanjing, China. She is now a senior member of China Computer Federation. Her current research interests include pattern recognition, image processing, artificial intelligence, and mobile robots.

    View full text