Feature selection for least squares projection twin support vector machine
Introduction
Support Vector Machines (SVM) [1], [2] is a very useful machine learning method, which is developed on the basis of the statistical learning theory and structural risk minimization [3], [4]. Compared with other machine learning approaches such as Artificial Neutral Networks (ANNS), SVM implements the structural risk minimization rather than the empirical risk minimization principle, thus SVM minimizes the upper bound of the generalization error. As a powerful tool for supervised learning, SVM can handle small samples, nonlinear, dimension disaster, over learning and local minimum problems [5]. It has been successfully applied to a variety of real-world problems such as face recognition [6], [7], [8], text categorization [9], bioinformatics [10], [11], [12], time series prediction [13], and regression estimation [14], [15].
However, one of the main challenges for the classic SVM is the high computational complexity, and its computational complexity is at most O (), where is the number of training samples [16]. This drawback restricts the application of SVM to large-scale problems. In the last few years, a series of modified SVM algorithms have been developed to improve computational efficiency. Proximal Support Vector Machines (PSVM) [17] solve a set of linear equations rather than convex optimization problems, and its time complexity is O (), where is the dimensions of the samples. In essence, PSVM classifies the samples by two parallel hyperplanes on the premise of guaranteeing the maximum margin. Generalized Proximal SVM (GEPSVM) [18] is an extension of PSVM, which aims at generating two nonparallel hyperplanes so that each hyperplane is closer to its class and is as far as possible from the other classes. Subsequently, Jayadeva et al. [19] proposed the Twin Support Vector Machines (TWSVM). This algorithm seeks two nonparallel planes by solving two related SVM-type problems, each of which is smaller than a classic SVM. Compared with a classic SVM, the major advantage of TWSVM is that the speed of TWSVM is 4 times faster than that of the classic SVM. In order to further reduce the computational cost of TWSVM, the Least Squares version of TWSVM is proposed (LSTSVM) [20]. LSTSVM possesses extremely faster training speed since their separating hyperplanes are determined by solving a set of linear equations. Essentially, it solves two related PSVM type of problems. Shao et al. [21] proposed a modified TWSVM to improve the performance of classification, named the Twin Bounded Support Vector Machines (TBSVM). Ye et al. [22] proposed a multi-plane learning approach called Localized Twin SVM via Convex Minimization (LCTSVM). LIPSVM uses the selectively generated points to train the classifier. The major advantage of LCTSVM is that it is resistant to outliers.
Different from TWSVM that improves GEPSVM by using SVM-type formulation, the Multi-weight Vector Projection Support Vector Machines (MVSVM) [23] were proposed to enhance the performance of GEPSVM by seeking one weight vector instead of a hyper-plane for each class so that the samples of one class are closest to its class mean while the samples of different classes are separated as far as possible. The weight vectors of MVSVM can be found by solving a pair of eigenvalue problems.
Further, inspired by MVSVM and TWSVM, Chen et al. [16] proposed the Projection Twin Support Vector Machines (PTSVM). PTSVM solves two related SVM-type problems in order to obtain the two projection directions whereas MVSVM needs to solve two generalized eigenvalue problems. Both PTSVM and TWSVM are implemented by solving two smaller QPPs. Also, by using the proposed recursive algorithm, PTSVM can generate multiple projection directions. Experimental results in [16] show that PTSVM has comparable or better performance comparing with the other four state-of-the-art Multiple-surface Classification (MSC) algorithms (i.e. GEPSVM, TWSVM, LSTSVM and MVSVM). In order to further enhance the performance of PTSVM, Shao et al. [24] proposed a least squares version of PTSVM, called Least Squares Projection Twin Support Vector Machine (LSPTSVM). LSPTSVM solves two modified primal problems by solving two sets of linear equations whereas PTSVM needs to solve two QPPs along with two sets of linear equations.
In the past few years, the research in automatic feature selection optimization has attracted more and more attention. In particle for online applications, feature selection is a critical step for data classification, especially when the classification problems have high dimensional spaces. It is well known that the 1-norm SVM has the advantages over the 2-norm SVM since 1-norm SVM can generate sparse solutions and it thus makes its classifier easier to store and faster to compute and suppress features [25], [26]. In such way, only non-zero elements in weight vectors will be chosen as selected features. Therefore, the 1-norm SVM can automatically select relevant features by estimating their coefficients. Especially, when there are many noise variables in input features, the 1-norm SVM has significant advantages over the 2-norm SVM due to that the latter does not select significant variables [27]. A feature selection method for nonparallel plane support vector machine classification (FLSTSVM) [28] is proposed for strong feature suppression, in which a Tikhonov regularization term is incorporated to the objective of LSTSVM so that FLSTSVM can minimizes its 1-norm measure. The solution of FLSTSVM can be obtained by solving two smaller QPPs arising from two primal QPPs so that there is no need to solve two dual ones and thus results in sparse solutions. This is different from that in TWSVM.
In this paper, we propose a feature selection algorithm for outputting sparse solutions to LSPTSVM. The 2-norm regularization terms are replaced with the 1-norm ones in the primal problems of LSPTSVM, thereby helping suppress the input features of LSPTSVM. FLSPTSVM obtains two planes by solving two primal rather than dual QPPs, each of which is smaller than that of PTSVM and comparable to that of LSPTSVM. This paper is organized as follows: Section 2 briefly discusses TWSVM, MVSVM, PTSVM, and LSPTSVM. Section 3 proposes our FLSPTSVM and experimental results are described in Section 4. Finally, concluding remarks are given in Section 5.
Section snippets
Related work
In this section, we give a brief description of TWSVM, MVSVM, PTSVM and LSPTSVM.
We consider a binary classification problem in the n-dimensional real space . The set of training data points is represented by , where , the input belongs to class and , are corresponding outputs. We further organize the inputs of Class+1 by matrix A in and the inputs of Class−1 by matrix B in . The 2-norm of is denoted by , and
Linear FLSPTSVM
Different from LSPTSVM, the objective functions (22), (23) in LSPTSVM are modified as follows:and
Note that formulations (28), (29) are slightly different from both (22), (23). FLSPTSVM aims to suppress the input features. In FLSPTSVM, the squares of 2-norm regularization terms and are replaced by 1-norm ones ( and), respectively. This can
Experimental results
In this section, in order to demonstrate the performance of our approach, we report results on publicly available data sets: the UCI Repository [35], and two artificial datasets “Cross Plane” and “Complex Xor” [18], [23], as well as David Musicant׳s NDC Data Generator [36] datasets.
We focus on the comparisons between the proposed algorithms and some state-of-the-art multiple-surface classification methods, including GEPSVM [18], TWSVM [19], LSTSVM [21], MVSVM [23], LSPTSVM [24] and FLSTSVM [28]
Conclusion
We have improved LSPTSVM and derived a new algorithm for binary classification, which is termed as FLSPTSVM (Feature Selection LSPTSVM) in this paper. Our FLSPTSVM is an effective algorithm to find two projection directions by solving two smaller QPPs. Similar to the other multi-plane classifiers, FLSPTSVM is capable of dealing with XOR examples. Experimental results on the synthetic datasets, UCI datasets and NDC datasets demonstrate that our FLSPTSVM obtains classification accuracy comparable
Acknowledgment
The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. This work was partially supported by the Jiangsu Key Laboratory of Image and Video Understanding for Social Safety (Nanjing University of Science and Technology), Grant no. 30920130122006, the China Postdoctoral Science Foundation (Grant no. 2014M551599), and the National Natural Science Foundation of China (Grant nos. 61272220, 61101197) and the Natural Science Foundation of Jiangsu
Jianhui Guo received his B.Sc., M.Sc. and Ph.D. from the Nanjing University of Science and Technology, Nanjing, China, in 2003, 2005 and 2008, respectively. In 2008, he joined the Nanjing Institute of Electronics Technology as a senior engineer. Currently, he is an Associate Professor in the School of Computer Science and Engineering, the Nanjing University of Science and Technology. His research interests include machine learning, data mining, pattern recognition, robotics and information
References (36)
- et al.
Kernel subclass convex hull sample selection method for SVM on face recognition
Neurocomputing
(2010) - et al.
Regularized least squares fisher linear discriminant with applications to image recognition
Neurocomputing
(2013) - et al.
Fast prediction of protein-protein interaction sites based on Extreme Learning Machines, fast prediction of protein–protein interaction sites based on Extreme Learning Machines
Neurocomputing
(2014) Support vector machines experts for time series forecasting
Neurocomputing
(2003)- et al.
Recursive robust least squares support vector regression based on maximum correntropy criterion
Neurocomputing
(2012) - et al.
Twin least squares support vector regression
Neurocomputing
(2013) - et al.
Recursive projection twin support vector machine via within-class variance minimization
Pattern Recognit.
(2011) - et al.
Least squares twin support vector machines for pattern classification
Expert Syst. Appl.
(2009) - et al.
Localized twin SVM via convex minimization
Neurocomputing
(2011) - et al.
Multi-weight vector projection support vector machines
Pattern Recognit. Lett.
(2010)
Least squares recursive projection twin support vector machine for classification
Pattern Recognition
Linear programming support vector machines
Pattern Recognit.
Quotient vs difference: comparison between the two discriminant criteria
Neurocomputing
Bi-density twin support vector machines for pattern recognition
Neurocomputing
A tutorial on support vector machines for pattern recognition
Data Min. Knowl. Discov.
An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods
Statistical Learning Theory
The Nature of Statistical Learning Theory
Cited by (34)
k-NN attention-based video vision transformer for action recognition
2024, NeurocomputingAn attention based dual learning approach for video captioning
2022, Applied Soft ComputingCitation Excerpt :For future work, we intend to develop more appropriate attention mechanisms and better video captioning approaches for video caption generation. Based on our previous research [59–62], we will develop new deep neural networks. We will also explore the use of other information such as audio and semantic information for video caption generation.
Dilated Adversarial U-Net Network for automatic gross tumor volume segmentation of nasopharyngeal carcinoma
2021, Applied Soft ComputingCitation Excerpt :Furthermore, adopted the matrix factorization method in the literature [41], our input samples can be decomposed into multiple low-rank matrices, NPC features can be extracted more accurately, and the models can be built more effectively on these datasets. And the feature selection method [42] can be used to optimize the sigmoid activation function of the discriminative network and sparse the input feature matrix to obtain higher classification accuracy. Although the use of an adversarial network proposed in this paper to solve the problem of uneven sample distribution, transfer learning can also extract high-level abstract features from the study of large numbers of traditional image samples, and transfer them to the scarce labeled medical image sample training.
Multi-cue based four-stream 3D ResNets for video-based action recognition
2021, Information SciencesCitation Excerpt :Deep convolutional neural networks (CNNs) have been widely used in image processing, computer vision, natural language processing and other fields [25,13,24,34,43], e.g. image recognition [45], object detection and face recognition [46]. The main reason for this success is that deep CNNs can learn deep hierarchical visual feature representation layer by layer, which is different from conventional shallow methods [9,39,44]. For action recognition, many deep learning based methods were developed.
An improved multiple birth support vector machine for pattern classification
2017, NeurocomputingCitation Excerpt :Different from traditional SVM, TWSVM generates two nonparallel hyperplanes by solving two quadratic programming problems (QPPs) in small size such that each plane is close to one of the two classes and is as far as possible from the other class [14]. Currently, the research of TWSVM has also made a great progress [15–28]. One of the most important extensions is multiple birth support vector machine (MBSVM) for multi-class classification [29].
Jianhui Guo received his B.Sc., M.Sc. and Ph.D. from the Nanjing University of Science and Technology, Nanjing, China, in 2003, 2005 and 2008, respectively. In 2008, he joined the Nanjing Institute of Electronics Technology as a senior engineer. Currently, he is an Associate Professor in the School of Computer Science and Engineering, the Nanjing University of Science and Technology. His research interests include machine learning, data mining, pattern recognition, robotics and information fusion. In these areas, he has published over 10 journal and conference papers. (Email: [email protected])
Ping Yi received her B.Sc. and M.Sc. both from the Nanjing University of Science and Technology, Nanjing, China, in 2003 and 2006, respectively. Currently, she is a Ph.D. student in the School of Instrument Science and Engineering, Southeast University, Nanjing, China. Her research interests include intelligent robot, machine learning and pattern recognition.
Ruili Wang received his Ph.D. in Computer Science from Dublin City University, Ireland. Currently he is a Senior Lecturer in Computer Science and Information Technology. His research interests include intelligent systems and speech processing. He has been awarded one of the most prestigious research grants in New Zealand, Marsden Fund. He is an associate editor and member of editorial boards of 5 international journals.
Qiaolin Ye received his B.Sc. in Computer Science from the Nanjing Institute of Technology, China, M.Sc. in Computer Science and Technology from the Nanjing Forestry University, China. Also, he received his Ph.D. from the Nanjing University of Science and Technology, China, in 2013. He is now an Associate Professor in Computer Science and Technology at the Nanjing Forestry University. His research interests include machine learning, data mining, pattern recognition and robotics.
Chunxia Zhao received her Ph.D. in Electronic Engineering from the Harbin Institute of Technology in 1998. Since 2000, as a full Professor, she has been with the Computer Science and Technology Department at the Nanjing University of Science and Technology, Nanjing, China. She is now a senior member of China Computer Federation. Her current research interests include pattern recognition, image processing, artificial intelligence, and mobile robots.