Simultaneous feature selection and classification using kernel-penalized support vector machines
Introduction
Classification is one of the most important data mining tasks. The performance of the respective models depends on – among other elements – an appropriate selection of the most relevant features which is a combinatorial problem in the number of original features and offers the following advantages [1]:
- •
A low-dimensional representation reduces the risk of overfitting [5], [10].
- •
Using fewer features decreases the model’s complexity which improves its generalization ability.
- •
A low-dimensional representation requires less computational effort.
Among existing classification methods, support vector machines (SVMs) provides several advantages such as adequate generalization to new objects, absence of local minima, and representation that depends on only a few parameters [21]. However, this method in standard formulation does not determine the importance of the features used [10] and is therefore not suitable for feature selection.
This fact has motivated the development of several approaches for feature selection using SVMs (see e.g. [6]). Those methods generally work as filters selecting features from a high-dimensional feature space prior to designing the subsequent classifier. They provide feature ranking but without considering the combination of variables that optimizes classification performance. In this paper a novel embedded method for feature selection using SVM for classification problems is introduced. This method, called kernel-penalized SVM (KP-SVM), determines simultaneously a classifier with high classification accuracy and an adequate feature subset by penalizing each feature’s use in the dual formulation of the respective mathematical model. In numerical experiments using four well-known data sets, KP-SVM outperforms existing approaches.
This paper is structured as follows. Section 2 introduces SVM for classification. Recent developments for feature selection using SVMs are reviewed in Section 3. KP-SVM, the proposed embedded method for feature selection based on SVM is presented in Section 4. Section 5 provides experimental results using four real-world data sets. Several important aspects that arise from this work are discussed in Section 6. A summary of this paper can be found in Section 7 where we provide its main conclusions and address future developments.
Section snippets
Classification with SVM
Vapnik [21] developed SVMs for binary classification. This section introduces the respective approach using the following terminology. Given training vectors , i = 1, … , m and a vector of labels , yi ∈ {−1, +1}, SVM provides the optimal hyperplane f(x) = wT · x + b that aims to separate the training patterns. In the case of linearly separable classes this hyperplane maximizes the sum of the distances to the closest positive and negative training patterns. This sum is called margin. To construct the
Feature selection for SVMs
According to [5], [6], there are three main directions for feature selection: filter, wrapper, and embedded methods. In this section we provide a brief overview of each one of these approaches, and present the methods that have been compared with the proposed technique in the present paper. The first direction (filter methods) uses statistical properties of the features in order to filter out poorly informative ones. This is usually done before applying any classification algorithm.
The Fisher
The proposed method for feature selection
An embedded method for feature selection using SVMs is proposed in this section. The reasoning behind this approach is that we can improve classification performance by eliminating the features that affect on the generalization of the classifier by optimizing the kernel function. The main idea is to penalize the use of features in the dual formulation of SVMs using a gradient descent approximation for kernel optimization and feature elimination. The proposed method attempts to find the best
Experimental results
We applied the proposed approach for feature selection on four well-known benchmark data sets: Two real-world data sets from the UCI repository [8], and two DNA microarray data sets. These data sets have already been used to compare feature selection algorithms (see, for example, [15], [22]).
For model selection we follow the procedure presented in [16]: training and test subsets are obtained from the original data set by dividing it randomly, preserving the proportions of the different classes.
Discussions
The main advantage of KP-SVM in terms of computational effort is that it automatically obtains an optimal feature subset, avoiding a validation step to determine how many ranked features will be used for classification. However, several parameters should be tuned in order to obtain the final solution. In this section we study the method’s performance by varying one parameter at a time, obtaining its influence on the final solution.
For the different data sets we vary the parameters C2, β, γ, ϵ,
Conclusions
In this paper we present a novel embedded method for feature selection using SVMs. A comparison with other feature selection techniques shows the advantages of our approach:
- •
Empirically, KP-SVM outperforms other filter and wrapper techniques, based on its ability to adjust better to the data by optimizing the kernel function and simultaneously selecting an optimal feature subset for classification.
- •
Unlike most feature selection methods, it is not necessary to set the feature number to be selected
Acknowledgements
Support from the Chilean Instituto Sistemas Complejos de Ingeniera (ICM: P-05-004-F, CONICYT: FBO16) is greatly acknowledged (www.sistemasdeingenieria.cl). The first author also acknowledges a grant provided by CONICYT for his Ph.D. studies in Engineering Systems at Universidad de Chile.
References (24)
- et al.
Selection of relevant features and examples in machine learning
Artificial Intelligence
(1997) - et al.
FS-SFS: A novel feature selection method for support vector machines
Pattern Recognition
(2006) - et al.
A wrapper method for feature selection using support vector machines
Information Sciences
(2009) - et al.
Multiclass SVM-RFE for product form feature selection
Expert Systems with Applications
(2008) - et al.
A novel feature selection approach: combining feature wrappers and filters
Information Sciences
(2007) - et al.
Feature selection for multi-label naive Bayes classification
Information Sciences
(2009) - et al.
Feature selection vía concave minimization and SVMs
- et al.
Adaptive scaling for feature selection in SVMs
(2002) - et al.
Choosing multiple parameters for support vector machines
Machine Learning
(2002) - et al.
An introduction to variable and feature selection
Journal of Machine Learning Research
(2003)
Feature Extraction, Foundations and Applications
Gene selection for cancer classification using support vector machines
Machine Learning
Cited by (235)
Predicting lodging severity in dry peas using UAS-mounted RGB, LIDAR, and multispectral sensors
2024, Remote Sensing Applications: Society and EnvironmentA novel enhanced hybrid clinical decision support system for accurate breast cancer prediction
2023, Measurement: Journal of the International Measurement ConfederationA new population initialization of metaheuristic algorithms based on hybrid fuzzy rough set for high-dimensional gene data feature selection
2023, Computers in Biology and MedicineLinear Cost-sensitive Max-margin Embedded Feature Selection for SVM
2022, Expert Systems with Applications
- 1
The author is presently affiliated with NetApp Bangalore India, Advanced Technology Group.